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Abstract 


Totally-Endoscopic Coronary Artery Bypass operations, which should ideally be entirely per- 
formed with a robotic surgical assistant, currently suffer from a high rate of conversion to more 
traditional procedures, partly due to the difficulty of identifying the coronary artery that is 
to be operated upon. One solution to this problem is to guide the surgeon by superimposing 
models of patients’ hearts onto the images provided by the robot. Motivated by the possibility 
of using motion information to (partially) constrain the registration of the model to the images, 
this thesis focuses on methods of estimating the motion of salient patches on the myocardial 


surface. 


We begin by introducing an importance sampling algorithm for hypothesising affine patch 
transformations in a particle filtering framework. The algorithm minimises uncertainty by 
multiplicatively combining information from multiple patch subregions. We devise a method for 
handling missing information based on empirical evidence that suggests that certain importance 


sampling probability ratios grow empirically with the number of subregions. 


We then describe methods for calculating the dissimilarity between image regions whilst taking 
into account specular reflections and illumination changes. We achieve insensitivity to these 
effects by explicitly removing illumination changes, tentatively masking out specular reflections, 


and ignoring pixel differences that exceed a percentile of a weighted pixel difference distribution. 


Next, we investigate myocardial deformation sequence models, and propose nesting a PCA 
model of the static deformations within a periodic B-spline-based PCA model of the deformation 
sequences. We use this model to simulate data sets that we can use to approximate maximum 
likelihood estimates of some parameters of the particle filter components, and we describe a 
way of testing whether or not particles have entered low-probability states in which they cease 


to contribute useful information. 


Acknowledgements 


First and foremost, I would like to thank my supervisor Daniel Rueckert for his guidance, 
patience, willingness to give me complete freedom to explore my own ideas, and for not giving 
up on me despite the numerous dead ends I encountered. I am also grateful to: my second 
supervisor Eddie Edwards, particularly for assisting me in acquiring the intraoperative data I 
used, and for not giving up on me despite my occasional late arrivals at the operating theatre; 
the EPSRC for grant EP/C523008/1 that funded the project that my work was a part of; and 
the Department of Computing at Imperial College London for granting me the DTA award 
that funded my studies. 


I am also inexpressibly grateful to the many wonderful friends I have made over this four 
year journey. In particular, I would like to thank: Paul for his mathematical advice, his 
music/comedy recommendations, and for not giving up on me despite me bombarding him with 
emails on topics as diverse as differential geometry and “quantum-mechanical equestrianism” ; 
Delphine, Ute, Fiona and Karim for the many concerts, restaurants, trips etc. that we have 
enjoyed together, and Delphine especially for all the fun music-making sessions and for not 
giving up on me despite my crippling stage fright that made many of our chamber recitals less 
splendid than they could have been; Dong Ping for her wonderful assortment of Chinese teas, 
for sneaking into lecture theatres with me to covertly watch films on the weekends, for regularly 
pummelling my back with her unexpectedly powerful fists, and for the many emails she sent 
me that consisted of nothing but a single question mark; Alexei and Uri for providing me with 
countless, much-needed “short breaks” that, with probability 1, turned into long evenings of 
unashamedly ultra-geeky humour; Arnaud and Ralph for all the jazz jam sessions and for not 
giving up on me despite my poor style of improvisation; Maria for our jazz singing sessions; 
Claire for her enthusiastic championing of Linux and the Liberal Democrats (not to suggest 
that I am particularly partial towards the latter); Kanwal for her multilingual birthday wishes; 
and the Imperial College Symphony Orchestra for giving me the opportunity to spend seven 


years playing so much of the most glorious music I know. 


However great the magnitude of the inexpressibility of my gratitude to my friends may be, 
my gratitude to my family must, at the very least, be a point of even greater magnitude in 
the Banach space of ineffable things, if not an entire subspace of such points covering the 


uncountably infinite ways in which I cherish them. This is a corollary to the following lemmas 


(I leave the proof as an exercise for the reader): 1) were it not for my parents, I would not 
be; 2) were it not for my family, I would not be the person that I am; 3) were it not for my 
family’s inexhaustible love, encouragement and laughter, I would not have embarked upon this 


long academic voyage in the first place. 


And finally, I would like to thank the makers of the fine Colombian coffees that I have been 
consuming in unhealthily large doses over the past few months. Were it not for their efforts, I 


would probably have less of a headache now, but also less of a thesis. 


Dedication 


To the many wonderful teachers I have had 


“Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” 


Aaron Levenstein 


Contents 


Abstract 


Acknowledgements 


1 Introduction 


It LMotivation andwObieciives: a. 4:96 wie. Bs be ene ger eee a a G 4 eacg 
LD * WO OME DUIONS fs. ter hee Eth Geo GEE Betis a PSG ty et Re th tes Rea 


Lo. baement oF Oromia lity i: a ik gcse ax aed Gow ot hve et Boe ee BG Se ea 


Related Work 


A Product-of-Experts Approach to Importance Sampling 

Onl, SOV MOPSise 28, 542 et gee Uh se cit Eder elit od. Sieh Ue ih en te oo A ad fe 

3.2 Affine Patch Tracking with a Particle Filter .................... 
o2k Probabilistic Approximations: ca: x %esv ardor a Whee ea Wls a. } Mee acd 

3.3 Product-of-Local-Likelihoods Importance Sampling ................ 
og.» epGal PRpers ic sh Bit our ae ea ee oe ee oe Be ee 
3.3.2 Discrete, Bounded-Displacement Energy Functions ............ 


3.3.3 A First-Order Approximation of el ithe tee Se haem dtie, Se enki, Aeon es Gh 


9 


37 


Al 


10 CONTENTS 
go:  o¢al Thikelihood Enerey Problems... i: oos- 2 oe a2 GS ona e eS 59 

3.4 Results: Exponential Likelihood Ratio Growth Study ............2.2.. 71 
sds. ASOMOMISTOINSS a. th theeit o> a Ubthe a vas So ibe otieay h B ibe ass a ae ty BE Seat Sy Se ea be We 79 

4 Likelihood Evaluation 82 
AW “SV MOD Ie ed forte Stet Jetted ty arte ete arte tol ere a doth Ere a doch ih Aaa e 82 
4.2 Robust Image Region Comparison. ............0.- 0000022 eee 83 
Ava Lightness Normalisation Hiltering’ lo pe dicw ke bm eG we EO Ae ERR 92 
Agel. Abmmineion WOE t. c.dse.%e al Se es Fs GE Ota Oem BAP a TE dora 93 

Aveo “elated: Work 2 ene © 2.0 fy we tty ah Bek oe A eld ee bdo ich eee 95 

4.3.3 Weeal liohtness Normalisation, . .- sea 20h aan ethos Qa WE a: & erence 97 

ASA, Paramerric-Suriace Ping x fi. te bs x Ade oP les hE OA eh ARS 101 

4-4... specular -bighlieit. Maghine 2 4. Gare oe eas eee pie 2 ple es Ps By 104 
Mage ARES e: o-Ps ee Seok ot ee eck tee Ee ei ee eM oh 2S Ge ae a ee eR ks Edw, 106 
Brel? Ngee MTCC <a... ony of aes he Se BY go ae te SP AS ge Ge HSA Soe te Ee ark SH 106 

Bias Omi arapive Weste. es gv aets ok oy eet Boks Gu We hos hE ons ee? Ss 109 

AMO? RG OUC STON Teo py Tee soc de oy Ser SD SE ee A Eee AS Bue Bg Roe ie ee eS bs 112 
Issues to be Addressed in the Remainder of this Thesis 114 

5 Deformation Modelling 116 
Del, “DyMOPSIsie'S 6 -a-4 we Geek ale etek we ee Erde oe eee te Eine, Bee ea 116 
5.2 Dense Deformation Field Estimation .................2.2-2000- 117 
5.3 Deformation Mode Modelling® g.i-4 w oveb-diw eK Grae eek Se 125 


CONTENTS 11 


5.3.1 Deformation Field Inversion ...........0..0..0. 0.00008 127 
Sow. . Aine Point Sek Recistration. 102%: wos Pie ea ee BGS gee as 130 


5.3.3 Sampling Phase Offsets and Canonical Grid Similarity Transformations . 133 


5.3.4 PCA-Based Deformation Mode Models ................... 138 
5.3.5 Polar Coordinate PCA-Based Deformation Mode Models ........ . 142 

5.4 Deformation Trajectory Mode Modelling ...................... 146 
54:1 Déformation: Trajectory Models \s sin. a ec) -h a hae eo ea a a ee 147 
p42*  (BeSplinednner Product Spaces. = ica 2 2 olde ine, oa de 8 willed of ath 149 
5.4.3 A B-Spline-Based Gaussian DTM ...................04. 153 

Gry HOSTS ia ots ac Puberty a Pde a tld ek on dele Sewn ep a ce pate of hale 159 
pioolt Determation Mode Models: 2.0 £4 ait Bo eee ba ee Re EA Ge & & ese 165 
5.5.2 Deformation Trajectory Mode Models...............-..4.-. 169 

OOO COONCHISIOMG. go kG go PT htt EE 1g OE aE Bete Bee ee ke eG 175 
6 Towards a Strategy for Coping with Particle Loss 176 
Gull, OVOP SIS 6..2--a py tae aS de Gye SS & eo At & ea eek Pens. oe Bees eee Sw 176 
6.2; UMOUnC ION 42.3, te ari as es reek BY ea ee A ee A Sete dearer 177 
G3: Detecting Particle Loss: 2-24 2.2: sie dic fe eee ec tye Bae yk ee be eet ey 180 
6.3.1 Likelihood-Ratio-Based Particle Loss Inference. .............. 181 

6.4 Parameterising the Particle Loss Test .........0. 2.002.000 0 ee nee 185 
6.4.1 DTM-Simulation-Based Sample Set Creation. ............... 185 
OA - SIMilmOn-Pctalles tag Ay cn te Big Se cn Syke ew ewe Ek ee ae BPR 189 


6.4.3 Stretched Exponential Likelihood Parameterisation ............ 196 


i CONTENTS 


6.4.4 Selecting the Test Threshold... .......... 0.00.0. eee 202 

6.5 Some Methods for Restoring Lost Particles. ................2004 206 
OO. shesulis.2-6 5 @ ited ko eee b ale aia el la he al ew old Dee gt and of arte 210 
G:0ek « NsGse SE esG Nn eeuliete tek Sete Se eM ohne fete tt eR ete ee eR a Ses ga 212 

6.6,2) UParticle: Filter AGenraey 4 ce dxa-b/ gcse oo bse SN eo OG aw 220 

Gi0.3 “Womparative Tester 2. o-oo es oe BRS dk a ge Rens he 221 

Ot! SCOnCWIION.., dete 3%. dete be dots lee Jets he we ee ie 8 ee G Oe 228 

7 Conclusion 229 
7:1 Summary of Thesis Achievements .. 2... 2. 6 6.64 ee ee hae ee es 229 
ed Ure WOE ees 4k Sek lh eee ae ied Om eka ee Ene eee a 4 231 

A Notational Conventions 234 
B The After Math 236 
B.1 Calculating the Decomposition A= RSK ..................24. 236 
B.2. The Existence and Realness of 2 x 2 Matrix Logarithms ............. 238 
B.3 Calculating the Discretisation Resolution Parameters ..............-. 239 
B.3.1 The Rotational Discretisation Parameter dg, ............5.054 240 

B.3.2 The Skewing Discretisation Parameters 6, 2... 2 0 241 

B.3.3 The Scaling Discretisation Parameters 6,.........0.00 000000 a 242 

B.4_ Maximising the Importance Sampling Distribution Entropy. ........... 243 
Bib. Maximising 4 Particle Sets Butopy son Seco @, 4 Gee @ 2 ee ae eh we eS 245 


C Algorithms 247 


Bibliography 247 


13 


14 


List of Tables 


3.1 


3.2 


5.1 


5.2 


5.3 


5.4 


The values of the particle filter parameters that we manually defined. The unit 
p stand for “pixels”. Unless otherwise stated, we use the same values for all tests 
in this thesis. Note: € is a dissimilarity function that we will use to define Ap 


CNG: LAG. TE TC NOE CN ODEN sie. fe oe eae Bo ROWE OLED OM we hed Qvale ewe G SWI acd 


The percentiles of the number of subregions that defined the particles’ sampling 


CISTHDUGIONS:,. fk 6) a2. cae PCRS wh eh ee eee eet Bate ood 2S 


The parameters we used for each type of canonical grid and their degrees of 
freedom (DoF). The parameter symbols are the same as those we used in section 
5.3.3. As mentioned earlier, the innermost rings of the unconstrained quasi- 
circular grids have just 2 DoF, because they are degenerate. However the in- 
nermost rings of the constrained quasi-circular grids (including the polar grids) 


have 0 DoF, as their vertices always lie at the origin. ............... 


This table shows the ranges of diameters and maximum edge lengths of the source 
grids. The maximum edge length of a grid is the maximum distance between 
neighbouring vertices. The diameters of the square grids are calculated as the 
distances between diagonally-opposite vertices. Their range of side lengths was 


PTOUGABOU. fo, tees Abe chee encte eee aect oe Grek ube died Bh ae diets ah he tae 4 


This table shows the percentages of source grids that contained at least as many 


landmarks as are indicated in the table’s body. .................. 


Landmark sequences: summary statistics... . 2... ...0.-200 000222 Gee 


15 


5.9 


5.6 


6.1 


Sampled values of M for which the 98 percentiles of the “Square” and “Cir- 
cular, Cartesian” landmark reconstruction errors fell within the 7 pixel limit for 


DSUDU Sl occ: eth Oa: Be eth dea og en Bae re eo aah ee Ryd ay PA ge en a a 174 


Sampled values of M for which the 95‘ percentiles of the “Square” and “Cir- 
cular, Cartesian” landmark reconstruction errors fell within the 7 pixel limit for 


DOCS lt GUC s se os, a8 a ooo a, aa ses ne Ss SR ed Ss ha wee SE eee th Se ty Sh bed 174 
This table summarises the parameters that we used to define the distribu- 


tions of the perturbations that we applied when constructing the D, set of 


patch/subregion differences. As always, the unit p stands for pixels. ...... . 211 


16 


List of Figures 


i a 


Le 


3.1 


The da Vinci system (a) and a close-up of its endoscope (b). The two large 
circles at the bottom of the endoscope are the stereo cameras, and the smaller 
circle above them is a light. The copyright of these images is owned by Intuitive 
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for the 230108.FO1 video. In all subfigures, the x axis denotes values of 7G 
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their source code. (c) An intraoperative image used by Stoyanov and Yang. (d) 
Stoyanov and Yang’s result after removing specular highlights from (c). Images 
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RGB space. (e), (kK) The lightness normalised L* channels. (f), (I) All of the S’ 
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An example of the trajectories followed by 5 manually-tracked points over a 
cardiac cycle, with units in pixels. The circular markers denote the locations of 
the points at 5-frame intervals. Each trajectory begins at the top-left marker. 
The range of displacements of the points, relative to their initial positions, is 
up to 300 pixels horizontally and 200 pixels vertically. The apparent lack of 


periodicity is due to the effects of respiratory motion. ............... 
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(a) The reference image from the 180608.HS sequence with manually-placed 
landmarks, with units in pixels. Lightness normalisation has not been applied. 
(b) The image from the sequence that appeared to have the greatest amount 
of residual motion after estimating the deformation and backprojecting. (c) 
The estimated deformation, with the landmarks from image (b) shown. The 
red pluses show the true landmark positions, and the blue crosses show the 
deformation field’s estimates of their positions (d) Close up of the reference 
image. The indicated convex hull is the region of interest. (e) The result of 
backprojecting image (b). (f) The difference between the reference image and the 
backprojected image. Note that difference images do not highlight deformation 
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(a) The time t, regular grid G is deformed by 4. Each coloured quadrilateral 
v;(.S) in the deformed grid corresponding to a grid square S of G has a bounding 
box indicated with dashed lines and is added to H(S’), for each square S’ of G’ 
shaded in a similar colour to v;(S). (b) The bilinear interpolation coefficient a, 


which is used in the calculation of the position of v;'(p) in the grid square of G 
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(a) The 10 x 10 square canonical grid that we used in our experiments, with 
2 x 10 x 10 = 200 degrees of freedom. (b) A 6 ring, 20 sector quasi-circular 
canonical grid that we used. The 20 points of the degenerate innermost ring will 
always be collocated after any transformation, hence the grid has 2x (5x 20+1) = 
202 degrees of freedom. The two grids have equal area and a similar number of 
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(a) A point p’ uniformly sampled within parallelogram Dwazy can be linearly 
mapped to a point p in triangle Away, giving a uniformly-distributed sample 
within that triangle. (b) After selecting 0, p and ®, the smallest distance u from 


p to S,’s boundary determines the upper bound on the scale factor a. ...... 136 
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This figure shows an example of the result of fitting a B-spline to a quasi-periodic 
deformation vector subtrajectory. The crosses mark the subtrajectories of the 
first two principal components of the deformation mode model and of two of 
the matrix logarithm entries, and the solid lines of matching colours are the 


maximum likelihood B-splines that model them. The vertical red dashed line 
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marks the time point and the vertical black solid line behind it marks 
the candidate phase offset calculated over the B-spline using a method that we 
will describe in the next chapter. The data comes from a test based on square 
canonical grids over the 230108.FO1 data set that our results in the next section 
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Cumulative distributions of deformation field landmark reconstruction errors (in 
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Six deformation fields for video 180608.HS. The red pluses show the true land- 
mark positions, and the blue crosses show the deformation fields’ estimates of 
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These figures show the mean and first 10 modes of the deformation mode models 
for all 5 grid types. The top row contains the means, and the remaining rows 
show the modes, with eigenvalues decreasing from top to bottom. To highlight 
the kinds of deformations that the modes represent, we have scaled them all by 
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DTM landmark reconstruction error percentiles for 230108.FO1 under different 


numbers of B-spline control points (in pixel units). See text for description. . . . 


DTM landmark reconstruction error percentiles for 180608.HS under different 
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Eight samples from left to right of the mean trajectories and first nine modes of 
the square grid DTMs. To highlight the kinds of deformation sequence that the 


modes represent, we have scaled them by six standard deviations. ........ 


The encircled regions of (a) and (b) are an example of the kind of appearance 
change in which the colour of a feature (the near-vertical vessel section in (a)) 
fades away. (c) and (d) show some other changes of colour/texture that the DTM 
cannot generate. In some regions, colours seem to flow into each other, but in 
others new colours seem to appear from nowhere. Constructing D; by perturbing 
Qo’ () can account for the former of these occurrences to an extent, but it cannot 
account for the latter. All four images were taken from the 230108.FO1 sequence, 
after lightness normalisation. (a) and (c) are from frame 11 of the sequence, and 
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Some frames from a 50 frame deformation sequence sampled from the DTM that 
we constructed for the 180608.HS sequence and applied to the region of the first 
image surrounding the initial feature state shown in (a). For reference, we have 
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These figures show histograms for the Dz; and Dy sample sets generated for 
some patches from the 230108.FO1 sequence and the 180608.HS sequence. The 
rightmost number underneath each figure is an identifier for the patches — figure 
6.8 shows what these patches looked like. The stretched exponential distribu- 
tions that maximise the conditional likelihood of each sample set are shown as 
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The three possible states of Bg(b) relative to cFg(b), for some bin b and some 
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(a) The red curve shows cF@(0) as a function of 6 for feature 35 of the 180608.HS 
sequence, using optimal values for c, az and yz, calculated as described in this 
section and the previous section. The discontinuous blue line segments show 
Ba(6) for the same feature. (b) A graph of f(c) for the same feature. f has 


numerous plateaus, which arise due to the fact that cfg does not always intersect 


The green and red particles are to compare the blue time t — 1 subregion to time 
t subregions using image information from time 0. As the time t — 1 states of 
the particles are very different, the subregions of the first frame from which they 
will retrieve information (indicated by the small green and red squares) are also 
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Screenshots of our particle filter’s performance over the 230108.FO1 video, taken 


at 5 frame intervals. The quadrilaterals show the weighted means of the particles, 


and the mostly-blue crosses near their centres show the positions of the particles. 
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at 5 frame intervals. The quadrilaterals show the weighted means of the particles, 


and the mostly-blue crosses near their centres show the positions of the particles. 
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(a), (b) These figures show the changes in the distribution of the number of 
lost and non-lost particles over time. The numbers are always multiples of 50, 
which is the number of particles we used. This is because we resampled the lost 
particles whenever possible, meaning that either all of a patch’s particles were 
lost after processing each frame, or none were. (c), (d) These figures show the 
numbers of lost and non-lost particles per patch, accumulated over all frames 


that they were tracked over. Again, the values are all multiples of 50. ...... 


(a), (b) Error percentiles per frame after partitioning particles according to 
whether or not they were lost. (c), (d) Error percentiles per patch after parti- 
tioning particles according to whether or not they were lost. A pair of dashed 
vertical lines is given for each patch. The lines on the left mark the percentiles 
for lost particles, and the lines on the right mark the percentiles for non-lost 
particles. Patch 69 in 230108.FO1 only has one vertical line, because the loss 


test erroneously labelled all of its particles as lost. ...............-. 
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tions for each patch in the tests in which we compared the particle filter’s perfor- 
mance when estimating full affine transformations to its performance when only 
estimating patch displacement. The left-hand column of each patch’s results 
gives the error percentiles from the tests in which we estimated full affine trans- 


formations, and the right-hand column gives the displacement-only percentiles. . 


For each frame of the two videos, these figures show the percentiles of the dis- 
tributions over the 14 patches of the pyramidal KLT tracker’s (p-KLT, [14]) 
displacement errors and the errors in our particle filter’s weighted mean esti- 
mates of the patch displacements. The dashed lines show the p-KLT tracker’s 
results, and the solid lines show our particle filter’s results. In the figures labelled 
“ref.=first”, we used the first frame as the p-KLT tracker’s reference image and 
centred its reference patches over the positions of the landmarks in that frame, 
and in the figures labelled “ref.=previous” , we used the previous frame as the ref- 
erence image, and centred its reference patches over its estimates of the positions 
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Chapter 1 


Introduction 


1.1 Motivation and Objectives 


Coronary heart disease is one of the leading causes of death in men and women. It is caused 
by atherosclerosis, a condition in which the coronary arteries, which supply blood to the my- 
ocardium, become blocked by fatty materials and plaque. According to statistics from the 
British Heart Foundation, the disease accounted for 22% of male deaths in the UK in 2002 and 
17% of female deaths in the same year [101]. Statistics from the American Heart Association 
[65] indicate that about 1/6‘ of all deaths in the USA in 2006 were due to this disease, making 


it the country’s leading cause of death. 


The two most common procedures used to treat the condition are percutaneous transluminal 
coronary angioplasties (PTCA) and coronary artery bypass grafts (CABG). The former involves 
inserting a balloon catheter into the blocked artery and inflating it so as to open the blockage. 
The latter procedure, used in more serious cases, involves performing a median sternotomy 
(cracking open the patient’s sternum), arresting the heart, performing a cardiopulmonary by- 
pass (CPB) to maintain the flow of blood, harvesting an inessential artery such as the left 
interior thoracic artery (LITA) and grafting this artery onto the blocked coronary artery so as 


to bypass the blockage (the grafting process is commonly referred to as an anastomosis). 


dl 
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Although CABGs have been in use for over four decades and have had a high success rate, they 
pose a number of problems for the patient. Firstly, the sternotomy is a traumatic procedure, 
which results in a long recovery period of up to three months. In addition, a number of serious 
postoperative complications are associated with CPB, such as organ dysfunction and oedema 
(swelling) of the myocardium and other tissues due to emboli (vessel blockages) [70, 68]. In an 
effort to avoid these problems, a number of minimally-invasive alternatives have been developed 


that can be performed on the beating heart. 


The minimally-invasive direct coronary artery bypass (MIDCAB) [11, 110] operation is one 
such alternative. It is performed through a small incision in the chest (typically 10-12 cm 
long), alleviating the need for a sternotomy and hence reducing: intraoperative blood loss, 
the postoperative admission period and the overall recovery period. A mechanical stabiliser is 
used to reduce motion at the anastomosis site so that the operation can be performed on the 
beating heart, rendering a CPB unnecessary. MIDCAB is not without its problems however; 
harvesting the LITA is technically challenging for surgeons due to the limited work space and 
the restricted field of view. However thanks to recent advances in technology, these limitations 
can be overcome with the assistance of surgical robots. Such robots allow totally endoscopic 
coronary artery bypass procedures (TECAB) to be performed, in which the surgeon can view 
the operating field in 3D via stereo endoscopic cameras. The surgeon is able to manipulate the 
robot’s tools with four to six degrees of freedom through an intuitive telemanipulation interface 
that eliminates hand tremor and allows for very precise surgical movements. However, TECAB 
procedures still suffer from a relatively high rate of conversion to either MIDCAB or CABG 
[32, 56, 27]. Reasons for this include misidentification of the coronary artery and difficulty in 


locating it due it being obscured by excessive amounts of fat. 


Augmented reality (AR) guidance has the potential to reduce the conversion rate by the super- 
imposition of a preoperatively constructed, accurately registered model of the patient’s beating 
heart onto the surgeon’s view of the operating field. The feasibility of such guidance has 
been demonstrated by previous researchers on phantom and animal data [1, 106, 33], and [63] 
demonstrated the use of intraoperative ultrasound images for guidance in robot-assisted liver 


surgery. 
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Figure 1.1: The da Vinci system (a) and a close-up of its endoscope (b). The two large circles 
at the bottom of the endoscope are the stereo cameras, and the smaller circle above them is a 
light. 

The copyright of these images is owned by Intuitive Surgical [51]. 
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Figure 1.2: This figure shows some of the ways in which images from monoscopic and/or stereo- 
scopic cameras can be processed in order to extract shape information for augmented reality. 
Shape-from-stereo methods involve exploiting the disparities between corresponding points in a 
set of simultaneously captured images in order to estimate depth. Shape-from-shading methods 
use lighting models to estimate depth from changes in surface intensity. Motion information 
can be used to estimate shape directly, or to place temporal consistency constraints on methods 
that normally generate independent shape estimates from one time step to the next. 

The top-right image is taken from [80]. 
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The work we are presenting in this thesis was carried out as part of a larger project that had 
the goal of producing such an AR system for the da Vinci™ robot, by Intuitive Surgical [51] 
(shown in figure 1.1). Image guidance in this setting poses particularly severe challenges, as 
the large homogeneous regions on the surface of the heart are a source of great ambiguity. 
In addition to this, the myocardium undergoes non-trivial deformations, and the combination 
of the heart’s moist, shiny appearance and the bright light from the endoscope causes large, 
distracting specular reflections that must be taken into account. So to achieve accurate and 
robust guidance results in this setting, it is vital to exploit as much of the available information 
as possible in order to constrain the process of registering the model to the endoscopic images 


as tightly as possible. 


There are many different types of visual cue that can be used to extract information from 
endoscopic images, some of which are illustrated in figure 1.2. Accurate estimates of the 
motion of points on the myocardial surface have many important uses, such as placing temporal 
consistency constraints on estimates of shape derived from other methods, or even directly 
estimating shape using structure-from-motion methods ([{105, 113]). So for this thesis, we have 


chosen to focus on the problem of 


estimating the trajectories of patches on the surface of the myocardium and the 


sequences of deformations that they undergo, as observed through a single camera. 


Our long-term goal is to have dense motion sequence estimates for all visible points, or better 
yet, dense 3D motion sequence estimates computed from stereo information. This is a particu- 
larly challenging task for myocardial image sequences however, due to the large homogeneous 
regions, the large amount of deformation that can occur, and, in the case of stereo estimates, 
the narrow camera baseline. So our plan is to begin by developing methods for estimating the 
motion of salient patches as accurately and robustly as possible, so that this information can 


be used to constrain a dense motion estimator in future. 


It goes without saying that the use of any motion estimation algorithm for image guidance 


would require the algorithm to run in real-time. However, due to the great difficulty of the 
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task, we have chosen not to concentrate on the real-time requirement for the time being, so 
that we may focus on the equally important issue of how we might go about designing a motion 


tracker that can be relied upon. 


1.2 Contributions 


The following is a summary of our main contributions and of the structure of the main thesis 


chapters: 


e Chapter 3: In this chapter, we describe a particle-filtering-based framework for tracking 
patches on the myocardial surface. Under this framework, we present an importance 
sampling algorithm that is defined in terms of the motion likelihoods of small subregions 
within the patches. We use a Product-of-Experts-inspired framework to multiplicatively 
combine this likelihood information into distributions from which we can draw multiple 
hypotheses of each patch’s affine transformations. Furthermore, we provide empirical 
evidence that ratios of importance sampling probabilities grow exponentially with the 
number of subregions, and we use this observation to devise a simple method for handling 


missing likelihood information. 


e Chapter 4: In this chapter, we describe the methods that we use to calculate likelihoods. 
We propose a dissimilarity function based on the Earth Mover’s Distance, that achieves 
insensitivity to outliers by ignoring pixel pair differences that are greater than a pre- 
specified percentile of a weighted pixel pair difference distribution. We also propose a 


simple method for removing changes in appearance caused by illumination changes. 


e Chapter 5: This chapter focuses on methods that can be used to learn non-affine my- 
ocardial deformation models from a set of training data. The main contribution in this 
chapter is a B-spline based method for modelling sequences of myocardial deformations, 
which involves nesting a PCA model of the distribution of static deformed states within 


a PCA model of the distribution of the B-spline control points. 
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e Chapter 6: This chapter describes our initial attempts to detect and deal with parti- 
cles that end up in low-probability states, in which they contribute little to the particle 
filter’s estimate of the patch’s posterior distribution. Our contributions are: the use of 
“foreground” and “background” likelihood functions to infer when particles are in these 
low-probability states; a method for estimating the optimal value of a threshold that we 
use in this inference; the use of our deformation models from chapter 5 to simulate data 
sets that we can use to estimate the parameters of these likelihood functions and the pa- 
rameters of the likelihood functions from chapter 4; and a discussion of various methods 


that could be used to restore particles that are thought to be in low-probability states. 


These chapters of original contributions begin after the next chapter, in which we describe some 


publications by other researchers that are related to our work. 


1.3. Statement of Originality 


I hereby declare all work presented in this thesis to be my own original research, apart from 


work that is credited to other researchers. 


Chapter 2 


Related Work 


The work presented in this thesis draws on results from a diverse range of fields, such as function 
approximation, optimisation, image processing, principal component/functional data analysis, 
Monte Carlo simulation, statistical inference, and various other areas of statistical analysis. 
Rather than attempt to condense the relevant background material from all of these fields 
into a single, inevitably incoherent, chapter, we will reserve this chapter for a discussion of the 
background material that is most pertinent to our overall aim of estimating myocardial motion, 
and discuss all other relevant background material during the course of the remaining chapters, 


as the need arises. 


Significant progress in motion estimation and the identification of correspondences between 
images has been made in restricted environments in which the camera(s) is (are) presumed to 
be observing a static scene, such as “MonoSLAM” (single-camera Simultaneous Localisation 
and Mapping) [25], and multiview stereo reconstruction (see [96] for a summary of recently 
proposed methods and an evaluation of their accuracy). Much research has also gone into 
the design of dense optical flow algorithms (see [6] and [5] for reviews and evaluations), which 
may make weaker assumptions about the motion of objects in the images so that non-rigid 
deformations can be handled well, but typically only attempt to model the motion between 
pairs of images rather than over a whole image sequence. Between these disparate streams of 


research however, there lies a gulf of problems that require accurate estimates of the long-term 
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behaviour of objects undergoing complex deformations and occlusions. The application for 


which we have developed the work presented in this thesis falls into this class. 


Sand and Teller presented a method called “Particle Video” in [91] that attempts to find the 
middle ground between feature tracking algorithms, which typically calculate the trajectories of 
a sparse set of points independently, and optical flow algorithms. The output of their algorithm 
is an estimate of the trajectories of a dense set of “particles” (2D points) that are non-uniformly 
distributed in such a way that the density of particles in any given image region increases with 
the region’s visual complexity. Given a sequence of images, the algorithm uses a black box 
optical flow algorithm to calculate initial estimates of the motions of the particles in-between 
consecutive frames, and refines these estimates by using a variational optimisation algorithm to 
minimise the sum of a data term, that represents the amount by which a particle’s appearance 
changes over time, and a distortion term, that represents the dissimilarity between the motion 


of a particle and the motions of other nearby particles. 


Other researchers have proposed methods specifically for the task of myocardial motion esti- 
mation. One of the simplest methods is the block-matching method used by Ortmaier et al. 
in [77] to estimate the displacement of small myocardial regions based on the RMS intensity 


error. 


Stoyanov et al. proposed a more involved method for estimating the 3D translation of my- 
ocardial regions viewed through a stereo endoscope in [102], by extending the well-known 
Lucas-Kanade registration algorithm [66] to take the epipolar constraint into account*. Their 
method is based on templates defined around stereo feature pairs detected in the initial image 
pair using the Shi-Tomasi and Maximally-Stable Extremal Region feature detectors [99, 67]. 
These pairs are triangulated, and their 3D positions in subsequent images are estimated by 
minimising Lucas-Kanade-style objective functions, defined in terms of the sum of squared dif- 
ferences between the image pairs and linear transformations of the templates that preserve the 


epipolar constraint. 


“The epipolar constraint states that given any stereo pair of images, the set of 3D points that project onto 
a single point in one image will project onto a line in the other. 
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The method described by Mountney et. al in [69] begins similarly, by detecting Shi-Tomasi and 
Difference-of-Gaussian features and tracking them using the Lucas-Kanade method (in 2D). 
They then take the first n frames (for some adaptively selected n) of the estimated feature 
trajectories, and use them to construct ID3 decision trees [84] that classify input patches as 
matches or mismatches. These trees are then used to efficiently track each feature by classifying 
all possible features within a neighbourhood of its previous position. The published results 
suggest that the method is more robust than the Lucas-Kanade tracker that provided the 


training data. 


Richa et al. [87] proposed a method for estimating 3D non-rigid myocardial deformation from 


stereo pairs by calculating parametric mappings on R® which, when projected onto the 2D 


camera planes, best describe the correspondences between a reference image and the subse- 
quent stereo image pairs. The mappings they used were parameterised as Thin-Plate Splines 
(which we will discuss in more detail in chapter 5), and again, the objective function that they 
minimised was defined in terms of the sum of squared differences between the reference and the 


subsequent image pairs. 


Despite the encouraging successful results described by these authors, their methods and ex- 
periments have two shortcomings in common. Firstly, all of their published experiments that 
used real intraoperative images (as opposed to synthetic images, or images of phantom hearts) 
were performed on mechanically stabilised hearts, i.e. hearts that have had a metal device 
placed on them that reduces the motion of the anastomosis site (examples of such stabilisers 
are shown in figures 4.6c and 4.6d on page 104). In practice however, image guidance would be 
most useful to the surgeon before the stabiliser is attached, as the anastomosis site cannot be 
stabilised before it has been located (see the descriptions of TECAB on the beating heart given 
in [2, 118]). The myocardium undergoes much more deformation in the absence of a stabiliser, 


and so it is questionable how well the methods above would perform in this setting. 


The second shortcoming (which also applies to the Particle Video method) is that none of these 
methods attempt to model the uncertainty in their motion estimates. It is especially important 


to do this when estimating motion prior to stabilisation, when a tracker is most likely to fail 
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due to the large motions and be in need of information about other likely states. We believe 
that a statistical approach to tracking, in which some representation of a distribution over the 
possible feature trajectories is propagated through time, would provide the most natural and 


mathematically-rigourous mechanisms for modelling the uncertainty. 


In our application, as in many practical applications, the dynamics of the features we want to 
track and the observable image information are highly non-Gaussian and non-linear, and so 
a statistical algorithm that relies on such restrictive assumptions, such as the classic Kalman 
filter [55], would almost certainly fare poorly. So instead, we have chosen the more flexible 


particle filtering algorithm as a discrete framework within which to model these distributions. 


Particle filtering is based on the idea of propagating a finite set of weighted hypotheses (“par- 
ticles” ) through time, in order to approximate the posterior distribution of some time-varying 
random variable of interest, i.e. with the aim of approximating the distribution of that variable 
at each point in time given all of the observations that were available up till that time point. 
We will delve into the mathematical details of the algorithm and our implementation of it in 
the next chapter, and the discussion of different facets of it will continue over most of the rest 


of the thesis. 


Chapter 3 


A Product-of-Experts Approach to 


Importance Sampling 


3.1 Synopsis 


In this chapter, we shall lay the groundwork for the work that we will present in the remaining 
chapters of this thesis. We will begin by describing the mathematical assumptions of the particle 
filtering algorithm that we use to track patches. After this, we shall introduce our first original 
contribution — an importance sampler from which the particle filter may draw hypotheses about 


the current state of the patch that we wish to track. 


We model the importance sampling distribution as a factor of the posterior distribution’s like- 
lihood term, to avoid the instabilities caused by particles that are occasionally drawn with 
extremely low importance sampling probability ending up with extremely high weights. We 
strive to reduce the importance sampling distribution’s uncertainty by multiplicatively com- 
bining local likelihood information from different subregions of each particle’s predecessor. We 
then discuss a generalisation of our importance sampling approach to the case in which some 
of the local likelihood values are undefined (e.g. as a result of occlusions caused by specular 
highlights). This generalisation relies on the assumption that ratios of importance sampling 


probabilities grow exponentially with the number of local likelihood terms that define each im- 
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portance sampling value. To justify this assumption, we conclude the chapter with an empirical 
study of the extent to which it seems to hold in the most important case, in which one of the 
importance probabilities in each ratio is the maximum of the importance sampling distribution 


that it comes from. 


3.2 Affine Patch Tracking with a Particle Filter 


The general problem particle filters attempt to solve is that of estimating the posterior dis- 
tribution of a sequence of hidden states O., = (Qpo,...,©z), given a sequence of observations 
Zot = (Zo,..-,Z+) that depend on the hidden states. We are interested in tracking a quadri- 
lateral patch through an image sequence, which we model as the problem of estimating the 
sequence of transformation parameters that determine its trajectory, so that the unknown 
transformation parameters at time ¢t form the hidden state ©;, and the image information at 


time t forms the observation Z;. 


To improve the well-posedness of the tracking problem and make the necessary calculations 
tractable, we assume that the patch’s transformations are approximately affine. We define the 
patch to have a default state in which its four vertices V = (v; | --- | v4) are centred at the 


origin. V(6;), the state of the vertices when transformed by 6;, is then given by 


V(O) = M(V; 0) = A(O,)V + (d() |---| d(:)) , (3.1) 


cos (414}) SS sin(4/)) Disa] 0 1 Pica] 


sin(4(4)) cos(4{4)) 0 Oisy] Dhicy] 1 


d(0) £(a,); ay); 


6 = [14 ER; Oe,,s,1 € (0,00)?; One n,) € {(2,y)? € R?: cy <1}; O44, €R?] , (3.2 
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ic. R(O), S(O) and K(O@) are the rotation, scaling and skewing matrices defined by 6, and 
d(@) is 0’s translation vector. The matrices have strictly-positive determinants, disallowing 


reflection and zero scaling. 


This representation is non-unique due to the possibility of encoding rotation and scaling infor- 
mation in the skewing parameters 6/,,) and @|,,), as shown in figure 3.1, in which the red square 


is the result of transforming the blue axis-aligned square by 


which scales it vertically and horizontally by V3 and then rotates it clockwise by arccos (,/4). 


This can either be achieved with 


ie = Pfc] = 0-5; Op.) = Op = 1, Pg =O, 


5 4 
Fry = Gio} =9, Os.) = Aisy] = 1c 614) = arccos (vi) 


So in practice we store A rather than 4/4.5,\s,,42,<), and we use a slight variant of the QR 


or with 


decomposition to uniquely calculate the decomposition in eq. (3.2) when we need to use 6’s 
components. The details of the decomposition are given in appendix B.1. Its uniqueness comes 
about by requiring @),,) = 0. This component is still useful however, as will be seen in section 


3.3.3. 


3.2.1 Probabilistic Approximations 


We assume that the initial state O9 of the patch is known to be 6 (we specify it manually, 


but it could be chosen by an automatic feature detector). By Bayes’ theorem, the posterior 
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-1.5 -1 -0.5 0 0.5 1 1.5 


Figure 3.1: An example of the non-uniqueness of the decomposition in eq. (3.2), in which the 
red square’s state can be expressed as a transformation of the blue square’s with two different 
instantiations of @. 


distribution, 7,(©1.,), can be expressed as follows: 


m7 (O12) = P(O1+ | Zoz, Oo) 


1 
= RP eat | Oo:t, Zo)P(©1: | Zo, Qo) 
t 
| t 
= a [Pe | Zo:i-1, Oo) P(9; | Cui, Zo) ’ (3.3) 


where K,; = P(Z, | Zo, Qo) is a normalising constant (we write Og as shorthand for Oo = 4p). 


To simplify matters, the observation Z; is often assumed to be conditionally independent of 
observations Zp.;-1 given the state ©;, reducing the likelihood terms P(Z; | Zoi—1, 0:4) to 


P(Z; | ©;). We make the following more general assumption however: 


A,(®:) = P(Z; | Zoi-1, S02) = P(Z: | Zoz-1,@0:) - (3.4) 


Retaining Zo.;-1 and Oo.;-; here does not mean that Z; is always assumed to depend on them. 
Rather, it should be interpreted as meaning that we reserve the right to use some subset of 
these variables when evaluating A;, if we wish. The reason for this will be made clear in the 


next section. 


Removing Z;’s conditional dependency on ©;41 lends eq. (3.3) a Markoy-chain-like structure 


in which the hidden states can be recursively sampled. [.e. it allows us to expand 014 = 014 
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drawn from the time ¢ posterior 7; to a sample from 74, by sampling 0,4; with probability 


Ky 
Key 


P(Zeq1 | Zo:t; Or41, Oo:1 = 0:4) P(Or4 | Oo: = bo:t; Zo) : (3.5) 


The state transition priors p;(Q;) = P(©; | Qox—-1,Zo) are often assumed to follow first-order 
Markovian dynamics. However we are only interested in the effects of A; and the importance 
sampling distributions (which will be defined shortly), since this thesis focuses purely on the 
design of these components. So we assume p; to be uniform, so that it will contribute no 


information about ©,’s state. Nonetheless, we will retain p; in our equations for completeness. 


Given these assumptions, we are typically interested in estimating the posterior expectations 


of functions f(© 1), so that we can, for example, calculate moments of ©;.: 


E|f(©1)] = | #e0 [[ Ai@ap.(4:) di , (3.6) 


¢ i=1 
or more generally, we may wish to find a Fréchet mean f;, of f(©1.,) 


t 


j= argmin | Pf", F(Br4)) T] Ai(O)Be(O%) re (3.7) 


i=1 


where d(-,-) is a metric over f, and in both cases we integrate over each parameter of 6, over 
each time step. In practice, such integrals are intractable to calculate, and the normalising 
constant K; is generally unknown. Also, sampling directly from the distribution in eq. (3.5) is 


difficult in general, so the integrals cannot be estimated by Monte Carlo integration. 


The standard solution to these problems is to approximate the posterior 7, with a set of n 
positively-weighted samples (or particles), (ol, whl) .., (en) wl"), each drawn from distri- 


butions rp 10), known as the importance sampling distributions. The particles are recursively 
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updated from one time step to the next as follows: 


; (3.8) 


where A; may implicitly depend on subsets of the predecessors eu) of oa as mentioned 


above. The posterior expectation is then approximated as 


Elf] © A" = Yo Of wl! , (3.9) 

j=l 
and Fréchet means can be estimated by replacing f (eo!) with d?(f’, f (oe!) and minimising 
over f’. To weaken the influence of outliers, we only calculate means over the particles that 


account for the top y% of the weights (we typically take the top 90%). 


In our work, we are interested in calculating weighted means of the affine transformations 
hypothesised by the particles at each time step. To do this, we calculate the weighted arithmetic 
mean of the displacements aol), and we estimate Fréchet means A, of the affine matrices 


A(ol!) using the method described in [3]: 


A, © exp {Sow wat acer | : (3.10) 


j=l 


where exp(-) and log(-) are the matrix exponential and logarithm respectively, which are both 
matrix-valued functions of matrices. Efficient and numerically stable algorithms for calculating 
these functions are given in [40] and [57] (the matrix logarithm algorithm requires the matrix 


square root, which can be calculated using the method in [9}). 


To give a very brief explanation of this result, the matrix logarithm maps matrices to a space 


that is tangential to the manifold they naturally “live” on (the manifold is a matrix Lie group 
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and the tangent space is the corresponding Lie algebra). In this space, distances between 
matrices can be calculated using the Euclidean metric (to a certain extent), and hence matrices 
can be averaged using the (weighted) arithmetic mean. Finally, the tangent space mean can be 
mapped back to a matrix in the original space using the matrix exponential. A more thorough 
and rigourous explanation would lie far outside the scope of this thesis, so the reader is referred 
to textbooks on differential geometry and Lie algebras for a comprehensive treatment of these 


matters. 


It is worth noting that the matrix logarithm does not always exist, and even when it does, 
it is not always real. [24] gives a necessary and sufficient condition for any matrix to have a 
real logarithm, in terms of the parity of the number of Jordan blocks corresponding to each 
eigenvalue, and [74] specifies cases under which a 2 x 2 matrix has a real logarithm. We did not 
encounter any complex matrix logarithms in any of our tests in this thesis, but nevertheless, 


we describe the geometric significance of these cases in appendix B.2. 


With the exception of the departures from the usual treatment of the likelihood function and 
the prior that we have noted, what we have outlined so far is the simplest common form of 
particle filtering algorithm, known as the Sequential Importance Sampling (SIS) filter [28]. It 
is known to suffer from degeneracy issues, in which all but one particle eventually end up 
with 0 weight. Numerous solutions to this have been proposed. The simplest is the Sampling 
Importance Resampling (SIR) algorithm, in which, after calculating the time t particle weights, 
each particle is resampled, in the sense that it is replaced with particle oy with probability 
wl | An often convenient special case of this is the bootstrap filter [100, 41], in which the prior 


is used as the importance sampling distribution. 


There are many considerations to be made about the design of the particle filter’s constituent 
distributions before we get as far as dealing with degeneracy issues, so we will not begin to 
address them until chapter 6. But for more thorough descriptions of some of the methods that 
have been used to mitigate them, and of other variations of the particle filtering framework, we 


refer the reader to [28, 4]. 
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3.3. Product-of-Local-Likelihoods Importance Sampling 


The question now is: given a set of particles up to time t — 1, what distributions should 
the time t particles be drawn from? A popular choice, used for example in [52], is to model 
the prior as a low-order vector autoregressive process with Gaussian noise and take it as the 
importance sampler. In this case, the hidden random variables are treated as vectors (oy I and 


the importance sampler /prior IP ! is taken to be the distribution of the n‘-order linear process: 


t-1 
C= S> XO +y+e, (3.11) 

i=t—n 
where the X; are fixed coefficient matrices, y is a fixed vector and € is a Gaussian random 


noise vector (see [12] for a detailed analysis of the (zero-mean) first- and second-order case). 


One advantage of these models is that sampling from them just involves sampling from a 
multivariate Gaussian distribution, which can be done very efficiently. Another is that they 
avoid the problem mentioned in [37] of particles that are occasionally sampled from rf with 
very low probability being given extremely large weights by eq. (3.8), leading to a poor estimate 
of the posterior, which in turn may cause the tracker to drift. This problem does not occur 
in these models because the importance sampling density is equal to a factor of the posterior, 


causing it to disappear from the weight update equation: 


wl = wil A (ov) . (3.12) 


The main disadvantage is that they do not explicitly use the image information from frames 0 
to t — 1 (although dependencies on it are implicitly created when a particle set is resampled), 
and worse, they are completely independent of the available time t image information. This 
means that the Gaussian term needs to have high variance to account for the large errors that 
will inevitably occur in the deterministic terms when the true motion violates the assumed 
linearity, which in turn makes the model statistically inefficient and typically requires the use 


of a large number of particles (e.g. [52] uses 500-1500). 
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Another problem is that the linear coefficients and Gaussian parameters of the model must 
either be selected on an ad-hoc basis (e.g. under constant velocity or constant acceleration 
assumptions) or learnt offline for the specific kind of motion that is to be tracked. The former 
option is only applicable to very simple tracking problems, whilst the human supervision in- 
volved in the latter can be inconvenient, particularly if many different types of motion are to 


be tracked. 


The inspiration for our importance sampling model comes, instead, from the Product-of-Experts 
model proposed in [49]. The idea behind such models is to multiplicatively combine a set of 
experts that can each constrain different dimensions of a random variable, so as to reduce the 


overall model uncertainty. 


Figure 3.2a illustrates how combining experts in this way can reduce the kind of uncertainty 
that leads to the aperture problem*) when estimating the motion of features known to be 
undergoing the same translation. In this example, the bent purple line at the bottom moves 
up to the position of the bent line above it, and its motion is to be inferred from likelihood 
information associated with the regions enclosed by the circles. For each encircled region r 
on the lower line, suppose we define a binary likelihood function that returns 1 under the 
hypothesis that r is displaced to an identical circular region, and 0 under all other hypothesised 
displacements. Then the likelihood of a circle on a straight line segment of the lower shape 
moving to any point g on a parallel line segment / of the upper shape will be equal for all 
q on | (apart from q in the regions near l’s endpoints). Thus, the motion of the circles in 
the directions perpendicular to the line segments they lie on can be determined exactly, but 
there is high uncertainty in the directions parallel to their line segments (as shown by the 
bright lines in their likelihood functions). However, enforcing the constraint that all circles 
undergo the same displacement and combining likelihood values multiplicatively allows the 
certain component of motion of circles on the rightmost line segment to cancel out the uncertain 
component of motion of circles on the leftmost line segment, and vice versa. Expressed in terms 


of the likelihood functions, likelihood values for corresponding displacements will be multiplied, 


*The problem of failing to take sufficient contextual information into account when estimating the motion 
at a point, leading to high uncertainty along one or more dimensions. 
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Displacement 
likelihood function 


y 


Displacement 
likelihood function 


(a) Aperture problem 


dy dy 


X 


a ) am ) ) 
o————> dx o—> dx o — > dx 
(b) Product-of-likelihood solution 


Figure 3.2: (a) The bent purple line at the bottom moves to the bent line at the top. Ar- 
rows between encircled regions indicate high-likelihood circle translations; each circle has high 
uncertainty in the direction parallel to the line segment it lies on. Pairs of circles not joined 
by arrows have low translation likelihoods. Taking each region’s displacement likelihood as an 
expert, a Product-of-Experts model gives high probability to the displacement indicated with 
solid red arrows, and low probability to all other displacements. (b) The pointwise product of a 
blue circle’s likelihood function and a green circle’s likelihood function annihilates uncertainty, 
producing a delta function. 


producing a delta function that peaks at the intersection of the white lines, as shown in figure 
3.2b. Such uncertainty-minimising behaviour can only be achieved by considering information 
from the current image Z;, and some subset of the earlier images. This is where our dependency 


assumption about the likelihood A;(@,) from eq. (3.4) comes into play. 


In addition to this behaviour, we would also like our importance sampler to be a factor of the 
posterior, to avoid the low-importance-sampling-probability problem mentioned above. So, we 
extract two types of information from Zo: local information, Z, 9.4, which, when multiplicatively 
combined, allows us to draw hypotheses about the patch’s motion from observations about the 
plausible transformations of its subregions; information about the whole patch, Z, jo, which 


we use to evaluate the plausibility of the hypothesised patch motions. We then assume that A; 
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factorises as follows: 
Ai(Oz) = P(Zez | Zeo2-1, S0:)P (Zr | Zo, Qf0,4) , (3.13) 


which implies that Ze, is conditionally independent of Z_,. How far from the truth this is 
depends on the way in which we process Z; to define Z, , and Z¢ 9.4, and on the joint distribution 
of the pixels within the patch. A thorough examination of this issue would occupy a whole 
chapter, and is outside the scope of the issues we are focusing on in this thesis, so we will just 
proceed under the assumption that they are conditionally independent. The methods we will 
describe in this chapter treat the manner in which Z; is processed as a black box, so it can 


always be changed to better reflect the assumed independence. 


Given that the first factor is non-negative, we can take it as an importance sampling distribution 


over ©, if we normalise it with respect to @;, i.e. if we define it such that 
[Pee | Ze 0:t-1, 9, Oor-1) d= 1. (3.14) 


To make it clear that it is a distribution over ©; and that the second factor is a function that 


evaluates the plausibility of ©;, we will use the following synonyms from now on: 


(Ox; Opis Zo:t) = P(Zey | Ze 0:t-13 Qo:z) 


L4(©1; Oo, Zo) = P(Zi4 | Zi0, Op) - (3.15) 


Dependence issues aside, the enforcement of the normality condition in eq. (3.14) is justifiable, 
since any real function f(a) can be factorised as f(x) = g(x)h(x), where h is an arbitrary real 
function satisfying 0 < h(x) < co for all x, and g(x) = f(x)/h(x). In our case, h is &, and the 
lack of formal requirements on A,’s (f’s) form (beyond it being a distribution over Z;) gives 
us considerable freedom to define L; (g) as we please. Extracting a probability distribution 
factor from A, like this may be compared to the use of sampling distributions in Monte Carlo 


integration (see [47] for a review). 
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Taking @; as our importance sampling distribution for each particle 7 reduces the unnormalised 


weights w’! from the expression in eq. (3.8) to 


wl = whl L1(Or; Oo, Zjo.y) pO!) . (3.16) 


The closest work to our importance sampling method (which we will describe in detail in the 
following subsections) that we are aware of is some articles from the neuroscience literature 
by Weiss et al. [115, 116] describing Bayesian models of visual motion perception. Their 
experiments involved calculating MAP (maximum a posteriori) estimates of a 50-parameter 
deformation field model, and comparing the result to the results perceived by humans. The 
likelihood term of their model was defined as the product of local likelihood terms from each 
pixel in a reference image (given the deformation field parameters), and the prior favoured mo- 
tion that was slow and spatially smooth. The key differences between their approach and ours 
are that while the deformation fields they estimate can describe more complex motions than 
the affine transformations we estimate, their model was designed to have a Gaussian posterior 
(presumably so as to facilitate MAP estimation), which does not allow for the possibility of 
multiple modes, and their model does not take into account temporal dependencies between 
the flow field parameters. Furthermore, they only tested their model on the simple synthetic 
stimuli used in psychophysical visual perception tests, such as rotating barbers’ poles, trans- 
lating plaids, etc., and the moving object in each test image sequence only underwent rigid 
transformations. So it is questionable how well their method would perform on natural images 


with complex deformations, like our intraoperative image sequences. 


3.3.1 Local Experts 


The experts we use to define ey l are descriptors of subregions within patch viol! 1). We define 
the centres of all subregions within each image to form a regular grid G = {id, : i € N}?, where 
the scalar 4, is the space between grid vertices, and we are interested in the vertices Gul Gee: 


that lie within the boundary of viol! 1). For each g € G, the subregion S(g) is defined by the 


3.3. Product-of-Local-Likelihoods Importance Sampling 5a 


Figure 3.3: (a) The crosses are grid vertices in G. The coloured crosses denote grid vertices 
lying within the red patch. The blue cross and square denote a subregion of the patch. The 
patch’s vertices lie within the shaded grid squares. (b) After an affine transformation in which 
the patch’s vertices remain within the shaded grid squares, only a few red grid vertices leave 
the patch (the ones with black circles around them), and only a few black ones enter it (the 
ones with red circles around them). 


image-boundary-aligned square centered at g with side length 2r, + 1 pixels, for some r, € N 


(see figure 3.3a). We will refer to the i" g € Gu as G48 


The effect of a hypothesised transformation od on any point p within patch vou! i) isa 


displacement to M(p; el ” el), where 


M(p; 0,9) =M(M~'(p; 6a); 9) 


M~"(B; 0.) =A~"(6.)(B — (d(a) |---| d(a))) - (3.17) 


So given a non-negative dissimilarity function Ay ;[S, 5] that compares a descriptor of subregion 
S (using information from any earlier image between times 0 and t — 1) to a descriptor of 


subregion S’ in Z;, expert 7 might give the likelihood of transformation 6; as 
C11) = C01; GPa OF) © y (Ave [Soe M (SoH) OF, 41)| se0,%) , (3.18) 


where, with slight abuse of notation, M is applied to each pixel coordinate in S (gi! 14), and 


BAe) Se en 
A a 
U(A;a,7) = (=) (3.19) 


We will give some of the motivating factors behind our choice of a stretched exponential form 
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Figure 3.4: A 30° clockwise subregion rotation about the cross in the centre can be approximated 


by rotating the subregion centres, but preserving the subregions’ alignment with the image 
boundaries. 


for y in section 6.4.3. Viewing y as an unnormalised distribution over Ay, the hyperparame- 
ters aw and ye control the distribution’s kurtosis and inverse decay rate (standard deviation) 
respectively. In keeping with MRF terminology, we refer to U(Ay; ay, ye) as y’s energy. More 
generally, whenever we refer to the energy of any nonnegative function f(x) with supremum 


f' < ow, we mean the nonnegative quantity 


—-In-——~. (3.20) 


We will discuss the precise form of Ay; that we used in the next chapter. 


3.3.2 Discrete, Bounded-Displacement Energy Functions 


For small subregion radiuses r,, we can approximate M’s effects on S (gi 14) by uniformly 


displacing all of its points by 
d’1(0,) = digh! 3 ol, 4) = M(gll a: of, 4) _ ee , (3.21) 


so that S (gi! 1,4) Maintains its alignment with the image boundaries (e.g. see figure 3.4). This 
allows us to turn the calculation of vy on its head by precalculating the energies associated 


with various displacements of S' (gi! 14) over a discrete, bounded domain, and then referring to 


these values when evaluating 0’ ee 
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To make this more precise, let 
4 {ida 1 €Z, lilba cd (3.22) 


be the set of subregion displacements under consideration, for some displacement sample spacing 


factor dg. Then we define a displacement energy function we I(.. g) for each g € Gul 1 as 
we (d € D;g) = V (Ave (Sg), S@) + dlsav,re) | (3.23) 


where, with further abuse of notation, S(g) + d displaces every point in S(g) by d (see figures 
3.5a and 3.5b for a real example of yl after negation and exponentiation). We can then 
approximate @’ 16.) by interpolating the energies pil. 9; UI ;) using the closest displacements 
dé Dto di) (0,). This of course assumes that d’)(0;) lies within D’s boundary. Section 3.3.4 


will deal with the case where this is not true. 


The advantage of this approximation is that when Ay, describes S (gi! i) 8 appearance in 


terms of Z;_;’s information only, wll (-; gi! ii) Will be the same for all particles that contain grid 


point gi! 14 and so it can be cached and reused. We will discuss the use of earlier images in 
chapter 6. Note also that while the approximation inaccuracy generally increases with r,, the 


accuracy can be improved by calculating Aw, with descriptors that are invariant to subregion 


rotation. 


3.3.3 A First-Order Approximation of gl 


Now that we have discussed how to calculate the local likelihoods ¢’ a the importance sampling 


distribution el can, in theory, be defined as 


Seba, 


eeu tly ae 
i=1 ole ) 


el 1(g,; QU) 1» Zo:t) = (3.24) 
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Figure 3.5: (a) A patch on the myocardium (green square), and one of its subregions S (gi),..) 
(blue square). The time t image is omitted, but it is mostly the result of a small displacement 
of the shown time t — 1 image downwards and to the right. (b) A subregion displacement 


likelihood function for S (gi!,.)’s displacements, defined as e~¥:9 ap) Light regions denote 
high displacement likelihood (low energy), and dark regions denote low likelihood (high energy). 


Unfortunately, the computational cost of calculating, or discretising and estimating, the integral 
would be prohibitively expensive, due to el hg high dimensionality, and so it would be difficult 
to sample from gl ! in this form. This problem can be overcome by Gibbs sampling. 

Momentarily considering gu las just a distribution over el ! and omitting its notated dependence 


on the images for clarity, Bayes’ theorem gives us the following factorisation of el for any 


permutation w(i € {0,...,3}) of @’s four component labels [4], [sz, sy], [Kxz, Ky] and [dz, dy]: 


ass 
= 
ge, 
D 
= 
ol 
II 


# (eis, = F:,..(0) OF as = O,..113)) 4 Cie = M,,.11:)) : (3.25) 


Hence, if we have a random variable 95) a) = 6.71.3) drawn from the second distribution in this 
t,w(1:3) (1:3) 
factorisation, we only need to sample ©;.,/9) from the low-dimensional conditional distribution 
¥ ( ) 
(which is referred to as a local characteristic of el in the Gibbs sampling literature) to create 


a sample oye ~ gu 


It is easy to approximately sample OF a) from the local characteristic in our case. We begin 
by bounding and discretising oan domain, giving a set X,9). Then for each x € X, ), 
let @..~0)(z) = 2, and O'.,¢:3)(2) = Ouc:3). Vector d’)(0'(x)) is the displacement that 0’(x) 


induces on gi!) 1a and the energy 7;(x) associated with this displacement can be estimated by 
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interpolating ~! (-; Ce .) around di (9' (x)), as described in the previous section. So by our 
Product-of-Experts model, we can approximate the local characteristic P (ef ns =x | G13) 


by multiplying together the local likelihoods defined by the subregion displacement energies 


Vil), Le. 


EP are (0) = 2 | G1) w f(z) = 


loch, Jor, 
LP co iO") = exp — SP min (di(e), 0") >, (3.26) 
i=1 i=1 


where k is a normalising constant and 7! is an upper bound on the energies that we use to 
help stave off floating-point underflow. Finally, we evaluate f (x) at all ¢ € X,,0), calculate 
oe f(x), sample a bin 2’ with probability k~! f(a’), and then draw a sample within 2’ 
with uniform probability. The accuracy of this approximation increases with the resolution at 
which X and D are discretised, and it also depends on the error caused by the assumption that 


subregion points undergo uniform displacement when calculating wl I 


However the second distribution in eq. (3.25) still has too many dimensions to be sampled 
from directly. The Gibbs sampling process solves this asymptotically by iteratively sampling 
from local characteristics like the one above. Given an initial state ©’ (0) iteration n updates 


the previous sample as follows: 


Oe Pe )| Suna =e aa 

ena) ~ P ar Soe 

ena) ~ P Cue e'Moay Oca) 

ceed Coney cate (3.27) 


(we have only fully notated the fixed components of el l in the first line, to avoid clutter). 


Geman and Geman proved a convergence result for Gibbs sampling in [36] which, in our case, 
implies that ©’ ™) will converge to a sample from gl as n tends to infinity, regardless of the 


value of the initial state @/ (naturally, the rate of convergence increases if ©’ ) is near a 
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mode of ely, The original proof was for discrete random variables, but as explained in [119], 


it extends to continuous random variables, like el I 


We take the initial state to be 0’ = Ql , and carry out just a single iteration, under the 
assumption that gull , is similar to gu so that el , should not be too far from a mode of gl 
(the use of multiple particles helps to mitigate the inaccuracy of this assumption). We start by 


sampling a displacement vector update d’ over domain D and updating ©’: 
O' ide,dy] = O'ldedy) + A” + (U — (0.5, 0.5)")da , (3.28) 


where U is a uniform random vector over [0,1)? with independent components. Then, using 
the new value of ©’ to calculate subregion displacements d!\(9! ) in accordance with eqs. (3.1) 


and (3.2), we sample a rotation angle update @’ over domain 
RAF {i554 € Z, |i]6g < oF , (3.29) 
and update ©’ again: 
Og := Og +¢ + (U—0.5)d, , (3.30) 


where U is a uniform random scalar over |0, 1). We repeat this process to sample a skew update 
K’ over 
K 2 {(i5ne,0)7 +4 € Z, |t]6.. < KE} U {(0, 16,,)7 + 1 € Z, |i]d,, < 6b} , (3.31) 


y 


and a scale update matrix S’ over 


A 


SOM ios,s,)) = {diag(é:,, 5%) : 4, € Z, dll < st dK < st} , (3.32) 


Sy 
which updates ©’’s scale parameters as: 
ee = diag (5,0-°°, On SO pues ; (3.33) 


where, again, Up and Uj, are independent uniform random scalars over [0,1). Finally, the new 
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hypothesised hidden state is od = 6 


The parameters 04 > 0, 6,. > 0 and 6s. > 1 determine the discretisation resolutions of R, K and 
S respectively, and dt, KI, fe), and s? determine the boundaries of these sets*. We calculate the 
discretisation resolution parameters so that an increase in rotation of 64, an increase in skew 
of 0,., Or an increase in scale parameter Con Ls] by a factor of 6,. causes the furthest vertex 
from the centre of the patch to move by one pixel. The details of these calculations are given 


in appendix B.3. 


Again, if Ay, only uses Z;_;’s information in the description of each S (gl! ii)» we can make an 
approximation that allows us to cache the conditional distributions for displacement updates 
and reuse them when processing other particles. The approximation is based on the observation 
that if you were to move the vertices of the red patch in figure 3.3a within the shaded grid 
squares that they lie in, a few grid points near the patch’s boundary might enter or leave the 
patch, but most grid points would not cross the boundary, as demonstrated in figure 3.3b. 
Given this, for any patch defined by a time t — 1 particle transformation Ql 1, the four grid 
squares that the patch’s vertices lie in can be taken as a hash table key that roughly identifies 
grid points Gul 1. 9o at the beginning of time step ¢t, we calculate these hash keys and group 
together all particles sharing a key value, as in figure 3.6. For each group @ with key kg, 
we calculate the weighted average of the affine matrices A(-) and displacements d(-) over Q’s 
particles (using the particle weights), giving an average transformation Og. Then for each 
particle 7 in Q, we define Gui, as the grid points within the boundary of V(Qg). Thus, the 


conditional distribution for displacement updates over D will be the same for all particles in 


@, and can be cached by key kg. 


3.3.4 Local Likelihood Energy Problems 


The energies ~;(2) involved in the calculation of each conditional distribution k~! f(x) may be 


undefined for one of the following reasons: 


*«/ and «} may be different, as ©’ must satisfy 0"),,,}9"|,,) < 1, so as to ensure that the determinant of the 
hypothesised transformation remains strictly-positive. 
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Figure 3.6: Dashed parallelograms denote patches defined by time t — 1 particles. Patches 
sharing a hash key (defined by the shaded squares that patch vertices lie in) are grouped 
together by colour, and the group averages are represented by the solid parallelograms. 


a) the displacement di)(9' (x)) may lie outside of D’s boundary; 


b) subregions S(gl, ) and/or Cae +d (for some d € D) may have high proportions 
of unobservable pixels, either as a result of a mask indicating occlusion by specular high- 
lights/foreground objects, or as a result of the subregions lying on or outside the image 


boundary. 


Intuitively, a Product-of-Experts model of gl should be able to handle the absence of a few 
energy values, especially if the defined energy values still provide reliable information (e.g. if 
the subregions for which energies are defined lie on non-parallel edges, as in figure 3.2a). So we 
would like a way of handling undefined energies that is equivalent to eq. (3.26) when all the 


desired information is available, and that degrades gracefully as information is lost. 


In the case where m of the energy functions w;(a) are defined everywhere in their domains 
and the remaining \Gu ,| — m energy functions are undefined everywhere in their domains, we 
can simply omit the undefined terms from eq. (3.26), as this is equivalent to only using m 
of the subregions around the patch’s grid points Gul , As far as that equation is concerned, 
this case is equivalent to the case in which there are exactly m defined energy terms for each 
transformation state x in the domain of the conditional distribution k—! f (x), but some energy 
functions are only partially defined. So we may handle this latter case in the same way, omitting 


all the undefined terms from eq. (3.26). This point is illustrated in figure 3.7. 
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Figure 3.7: The graphs in this figure represent two possible realisations of three energy func- 
tions. In (a), the energy function at the bottom is undefined everywhere in its domain (as 
indicated by the red crosses), and in (b) each energy function is partially undefined. In both 
cases, the total number of defined energy terms for each domain state x over the three functions 
is two. Furthermore, the sum of the defined energies over all three functions is the same for 
each corresponding state x in the two figures (the colours of the bars indicate that (b) is simply 
the result of swapping the values of the energy functions in (a)). Thus under eq. (3.26), both 
sets of energy functions would define the same conditional distribution k~' f(x) if we simply 
omitted the undefined energies. 


8 


In the more general case, in which the number of energy terms defined for each state x varies 
over x’s domain, it would not make sense to simply ignore the missing energy terms, for the 
same reason that it would not make sense to compare the total height of 300 5-year-olds to 
the total height of 100 7-year-olds when trying to experimentally demonstrate the fact that 7- 
year-olds are usually taller than 5-year-olds. That is to say that ignoring the undefined energy 
terms and using the total of the defined energies could lead to the case where less probability is 
given to a state x, at which a large number of energy terms are defined and all are low, than a 
state x’, at which only a few energy terms are defined and all are high (this would happen if the 
sum of the low energies was greater than the sum of the high energies). The obvious solution 
when comparing the heights of different age groups would be to compare the average heights of 
the groups rather than the total heights. But using the average of the defined energies rather 
than their total would not satisfy the requirement that our solution to the missing likelihood 
problem should reduce to equation eq. (3.26) when all of the energy terms are defined. This 
requirement is desirable because it brings about the possibility of sampling probability ratios 
growing at an exponential rate with respect to the number of subregions, causing the sampling 


distributions to approach delta functions even when the original likelihood functions have high 
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degrees of uncertainty, and hence giving our importance sampler its uncertainty-minimising 


behaviour. 


Although this exponential rate of probability ratio growth is a possibility, it is not guaranteed to 
occur in all cases. So we will now examine the conditions under which exponential ratio growth 
does occur, and conditions under which we can reasonably assume that it would occur if all of the 
energy terms were observable. In the course of doing so, we will present a generalisation of eq. 
(3.26) that reduces to it when all of the energy terms are defined and that, in a sense, preserves 
the possibility of exponential ratio growth when there are varying numbers of undefined energy 


terms. 


Exponential Probability Ratio Growth 


Taking X as the domain of the conditional sampling function k7!f (x) as before, let p;, 1 < 
te \Gu! ,|, define an arbitrary ordering over the subregion centres Gul , of a time t — 1 patch 
viol! 1), and hence over the energies w(x) associated with any given x € X. The probability 
of sampling transformation parameter x € X from k~!f is determined up to f’s normalising 
factor k~' by the likelihood energies ~(x), Wo(x),.... Given this energy term ordering, we 


define the n** partial mean energy w,,(z) for a state x as 
es Pee ne 
Bale) Swi) (3.34) 


By eq. (3.26), we can then redefine f,,(2), the unnormalised conditional probability of sampling 


x when using the first n energies, as follows: 
Laser. (3.35) 


This suggests a new interpretation of f’s energies as the product of an exponentiation rate, wn, 


and an order of magnitude, n. 


Generally speaking, it would be preferable to use as many energy terms as possible. So we 
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can view the normalised sampling distribution under n energy terms as an approximation of a 


normalised sampling distribution under an infinite number of energy terms, i.e. 
ko f(a) = kz foo(a) = lim ko fn(x) , (3.36) 
mM—>0o 


where k,, is f,’s normalising factor. iy fr(x) converges in n if the partial mean energies all 
converge, i.e. if 


Woo(t) = lim ,(z) (3.37) 


exists for all x € X*. When convergence occurs, kz! fx. is a distribution that gives equal 
probability p > 0 to all states x at which ~.(x) is minimal, and probability 0 to all other 


states. 


Suppose that the infinite sequences of energy terms involved in the definition of W,, correspond 
to a countably infinite set of subregion centres Gul ,, and that the ordering imposed on them 
by p,; gradually increases the number of subregion centres that fall within every neighbourhood 
of every point in the patch. This increase in subregion density increases the density of the 
image samples used in the definition of k>* fn(2). Since myocardial deformations are generally 


smooth, it seems reasonable to assume that wW,, and hence k7! fe will converge in n as the 


density of the image samples increases. 


Assuming that k>'f, does converge in n, the ratio of values of k~'f, at states that kx! f.. assigns 
different probabilities to must eventually tend to p/0 = oo (or equivalently, to 0/p = 0). In 
the following investigation of the conditions under which these ratios diverge at an exponential 
rate, we will consider f,,’s values at two transformation states 2 € X and x, € X, and we will 


adopt the following shorthands: 


Wai £ Oi(Ba) 4 Wo, 7 Wi(Zo) , 
Wiss = ae) ; Won 7 Wn(2p) : (3.38) 


*Note that Wo(a) is always finite when it exists, since the energy terms always lie in [0, ~"] 
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Ignoring the ambiguous case in which two parameters should be sampled from ky! a with equal 


probability, suppose, without loss of generality, that the inequality 


Jet a) > fr(Xo) (3.39) 


holds for some value of n. Taking negated logs, this then implies that 


Wan < Won (3.40) 
Furthermore, since 
at < Wan and Won < vw ; (3.41) 
where 
vy & infdas and vy * sup doi , (3.42) 


an upper bound on the ratio of the probabilities of sampling x7, and 2p is 


—naps 
Fn(&a) XZ S -= erly —va) (3.43) 
Tal@s) es 


which clearly grows exponentially in n. For a lower bound on the ratio to also grow exponentially 


in n, there must be a real number a > 0 and a natural number n’ such that 


Yn > 1! (don — Pan = @) 


=>Vn>n' (n(teon— an) 2 na) , (3.44) 


where we use n’ to allow for fluctuations in the sign of Wp» — Wan when the number of energy 


terms is small. When this holds, an exponential lower bound is given by 


Vn > n! C < erFrmn Ban) — ite) (3.45) 


Considering the fact that we are not always able to observe the same number n of energy terms 


for each x, the convergence of w,(x) in n is a useful assumption to make when trying to evaluate 
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f (x) as it, together with the assumption that ratios of i would grow exponentially in n if we 
could observe an arbitrary number of energy terms for each x, suggests that we can use the 
observable energy terms to calculate estimates ¢/(r) of W(x) for each x, and then combine 


these estimates with eq. (3.35) to redefine f(x) as 
FOS, (3.46) 


where the order of magnitude 7 is constant over all x and reflects the number of observable 
energies at each x. An obvious choice for 7 would be the mean number of observable energies. 
Similarly, if only m, energy terms were observable for 7, and m, for x», we could use the partial 


means Wives and Wong as estimates for ee and aan 


This definition of f(x) makes use of the exponential ratio growth model in the sense that 


Fla) _ (fild,my—Bama) (3.47) 


may grow exponentially with n, depending on the behaviour of tam, and Wm, as they converge 
in m, and m). In particular, if, as a generalisation of eq. (3.44), Wam, and Wm, are always 


separated by a margin a’ > 0 when m, and mp» are sufficiently large, i.e. if 
min(m,,m») > m! > Vim, — Vame > a (3.48) 


for some m’, then e”@’ is a lower bound on f(xq)/f (2) that grows exponentially in i. 


The case in which the exponential ratio growth assumption is most helpful in minimising 
uncertainty is the case in which Wam, < Wom,, Suggesting that xq is closer to a mode of te 
than x,. In this case, provided that m, and mz are large enough for the partial means to be 
insensitive to fluctuations in their energy term sequences, the energies Wg,1:m, are likely to be in 
close agreement, and the sign of tm, —Wam, is likely to be the same as the sign of Dom! —Wam!, 


for all m/, > mq and mj, > mp. So it seems reasonable to assume that 


min(M,ms) > m! A vams < Vom, > V mim, > min(ma,ms) (Yams, < Yomi) , (3.49) 
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and to take the truth of the antecedent (for some appropriately large m’) as evidence that 
f(xa)/f (as) is an exponential function of 7% (since the truth of eq. (3.49) implies the existence 


of a margin between tam, and Wom,; as in eq. (3.48)). 


When Wom, and Waym, are close in value, there is less certainty as to what their ordering will be 
as min(m,,™,) increases, and so a variant of eq. (3.49) in which < is replaced with < is less 
likely to be true. However, if both x, and x» are far from the modes, an exponential growth 
assumption will still lead to the most important behaviour of reducing the probabilities of x, 
and x, relative to the probabilities near the modes. If they are close to modes, then we have 


an ambiguous case that we cannot reliably deal with without additional information. 


The final case to be dealt with is the pathological case in which no energies are observable for 
some states X, CX. In this case, we have no information about the states in X,,, and so the 
best we can do is to take the most noncommittal approach in which no state of X,, is considered 
more likely than another, and the probability of sampling from X,, maximises the uncertainty 


of f. In [53], Jaynes advocates the use of Shannon’s information entropy, 


HS -S ke" f(z) n(k f(a) , (3.50) 


cEx 


as the measure of uncertainty to maximise in such situations, and he sketches a derivation of 
the measure from three weak axioms that an uncertainty measure should satisfy. The most 
intuitively necessary properties of H for our purpose are that it gives a measure of 0 to delta 
functions, that it varies continuously with the probabilities of the distribution in question, and 
that it is maximised by the uniform distribution. Thus, the problem of selecting an uncertainty- 
maximising probability to assign to the x € X, can be thought of as the problem of selecting 


a probability that will bring f as close as possible to a uniform distribution. 


Handling Underflow 


Whilst the preceding section focused on the issue of too few defined energy terms contributing to 


the probability that k~!f assigns to some state x € X, the opposite problem of too many defined 
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energy terms contributing to f (x) is equally important, as it can result in underflow (rounding 
to zero) when the energies are negated and exponentiated. For double precision floating-point 
numbers, the maximum number of subregions that can contribute maximal energy before the 
likelihood product falls into the denormalised (i.e. reduced precision) range is Leah and 
the maximum number that can contribute maximal energy before underflow occurs is [or | 


—1022 and the smallest 


(this follows from the fact that the smallest normalised number is 2 
denormalised number > 0 is 2~1°74 — see [39]). We set ~? = 30 for all results presented in 
this thesis, which means that 24 subregions are sufficient to cause denormalisation, and 25 are 
sufficient to cause underflow. Working in the denormalised range is undesirable, for reasons 
related to numerical accuracy, but underflow can have the destructive effect of causing states 


of X that should have high probability after normalising f to end up with 0 probability, and 


can even cause k~!f to be an invalid distribution (when underflow occurs at all states of X). 


The simplest way to prevent f from assigning denormalised values to the highest probability 
states is to uniformly rescale the values it assigns to all states by adding a value c to all of the 
negated partial mean energy terms. This is a valid transformation, as any factor by which f 
is scaled will disappear after normalisation. The precise value chosen for c is not critical, but 
needless to say, it should not be so high that it causes f to overflow! In our implementation, we 
define c as the minimum partial mean energy over each state of X, so that f always assigns a 
value of 1 to the modes before normalisation. Under this choice of c, multiplying f by k~! could 
only ever cause the probabilities at the modes to enter the denormalised range if k > 21°72, 
which, even if f was uniform, would require |X| to be about 1000 orders of magnitude (in base 


2) larger than it typically is! 


If f assigns a small value € to a non-modal state, the precise condition under which k~!e will be a 
denormalised number depends on the values that f assigns to the other states. However, we can 
estimate a lower bound e! on €, above which multiplying by k~! does not cause denormalisation, 


by assuming the most extreme case, in which f assigns value 1 to the remaining |X|—1 states, 
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in which case k = |X|+¢—1. k7'e is not a denormalised number if 


mes eon 
€ — 


in floating-point arithmetic, suggesting that 
es XS 1), (3.52) 


The sampling distributions that we calculate for displacement have the largest domains X. 
E.g. in the next section’s results |X| = 1681, for which e! = 2-1"!, Given that our rescaled f 
always assigns value 1 to the modes, the probability of sampling any state to which f assigns a 
value less than this bound is so low that it is immaterial whether or not these values of f enter 


the denormalised range or underflow. 


Summary 


To summarise all of this mathematically, for each x € X, let Ja(x) contain eu , 8 grid point 
indices i € {1,..., |G! |} for which ~;(x) is defined, and let [,,(x) contain the grid point indices 
for which 7;(z) is undefined. In addition, let Xqg = {x € X : |Iu(x)| > 0} (see figure 3.8 for an 
illustration of the relationship between masked out image regions and the construction of these 


sets). We redefine f(a) as 


= eh(x)—¥") 5, x E Xa 5, 
f(@) = (3.53) 


eeu , otherwise , 


where 
1 ? 1 ‘ : 
— A nate yat 14 mj 
n= |Xq| S- [Za(x")| ’ w(x) \Za()| Ps min(y;(x), ~ ) ’ YV min (2) ; 
wEXg i€l g(x) 
Se ee )) NB NSs-0 
by & Datexa A) (3.54) 


0 , otherwise . 
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If w;(ax) is defined for all i and x, mn = GY | = |Iq(x)| for all x, and f(x) is scaled by e”%” 
for all z. So up to a scaling factor, eq. (3.53) reduces to eq. (3.26) as required. In appendix 
B.4, we prove that when x ¢ Xq, f(z) = e~’" maximises the entropy of k~!f. Note that in 
general, when defining the order of magnitude term 7, we only average the number of observable 


energies over each x € Xq, since we treat x € X — Xq separately. 


The whole process of constructing the importance sampling distributions and using them to 
propagate the states of the particles is summarised in the function PropagateParticles and its 


helper functions, which are given in appendix C. 
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3.4 Results: Exponential Likelihood Ratio Growth Study 


The approach to handling undefined local likelihood energies that we presented in the previous 
section relied on a number of assumptions about the behaviour of the partial means as the 
number of defined energy terms grows. Our aim in this section is to provide empirical evidence 
for the correctness of these assumptions. In particular, the two must significant assumptions 


that we wish to validate are: 


1. the assumption that the i** partial mean ~,4 of any state x is likely to converge as i 


increases; 


2. eq. (3.49), which, if correct, places an exponential lower bound on the growth of the 
ratio between the importance sampling probabilities of pairs of states x, and x» with very 


different m,*" and m,“" partial mean energies. 


To test the validity of the two assumptions, we performed a series of experiments based on the 
defined parts of the discrete, bounded-displacement energy functions wil that determine the 
importance sampling distributions for the particles of 4 patches that we tracked in 2 videos, 
each from a different patient. All of the experiments in this thesis are based on the same two 
videos, which we refer to as “230108.FO1” and “180608.HS”*. To track the patches, we defined 
the local likelihood @’ and patch likelihood L in terms of the dissimilarity functions Ai and A; 
that we will discuss in the next chapter, and we determined their hyperparameters (aw and Yr 
for @’, and a, and y, for L) using the simulation-based maximum likelihood approach that we 


will describe in chapter 6. 


In that chapter, we will also describe methods for resampling the particle sets when particles 
begin to drift. We did not perform any resampling for the tests we are about to describe 
however, so as to maximise the number of distinct importance sampling distributions that each 
patch’s particles used (bearing in mind that we cache and reuse the displacement distributions 


for patches that share hash keys, as we described earlier), and hence maximise the amount 


*See the results section at the end of chapter 5 for a statistical summary of the amount of motion/deformation 
in these videos. 
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Parameter Value 
230108.FO1 180608.HS 

No. particles 50 50 
Initial patch side length 100p 80p 
Grid point spacing, 0, 15p 10p 
Subregion radius, r, 8p 7p 
yl sample spacing, 64 2p 2p 
Max. displacement, d* 40p 40p 
Max. rotation, ¢* 10° 10° 
Max. scale, s* 125. 1295. 
Max. skew, «/, Kh < 0.2 Ae 
Max. energy, yt 30 30 
No. € histogram bins, N 256 256 
E outlier cutoff for 0’, p 0.5 0.5 
E outlier cutoff for L, p O75 0:75 


Table 3.1: The values of the particle filter parameters that we manually defined. The unit p 
stand for “pixels”. Unless otherwise stated, we use the same values for all tests in this thesis. 
Note: € is a dissimilarity function that we will use to define Aw and Az in the next chapter. 


Seq. Percentile 
0% 25% 50% 75% 100% 


230108.FO1 20 41 46 54 92 
180608.HS 13 48 60 69 108 


Table 3.2: The percentiles of the number of subregions that defined the particles’ sampling 
distributions. 


of data available for analysis. The fact that not resampling means that some particles would 
have been in states of low posterior probability does not matter for these tests, nor does the 
accuracy with which the patches were tracked, as for these tests we are only interested in the 
effect that the number of defined energy terms has on the probabilities that the importance 
sampling distributions attach to different transformations between consecutive frames. This is 
an effect that can be measured without regard to the particular states that the particles are 
in in the first of each frame pair. Note also that not resampling means that the influence of 
the patch likelihood function L; on the state of the particles was very weak. The only point at 
which it exerted any influence at all was during the calculation of the weighted averages of the 
particles that shared hash keys, which, as discussed in section 3.3.3, is only done so that the 


particles in each group can use a consistent set of grid points. 
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The parameters that we manually defined for all components of the particle filter are sum- 
marised in table 3.1, and the distributions of the number of subregions that defined each 
calculated sampling distribution are summarised with the percentiles given in table 3.2. None 
of the ~ 321 x 10° energy terms that we calculated for 230108.FO1 were as large as the upper 
bound ~w', and only 0.009% of the ~ 335 x 10° energy terms that we calculated for 180608.HS 
were > w', so the effect of energy capping on the conclusions we will draw from the tests is 


negligible. 


Including the initial frames, we tracked the 4 patches over 20 frames in the 230108.FO1 se- 
quence, and over 19 frames in 180608.HS (see figure 3.9 for examples of the tracked patch states 
from 5 frames of each sequence). As we do not cache the skewing, scaling and rotation sampling 
distributions, the number of times each of these three types of distribution was calculated is 


given by 
(no. patches) x (no. particles) x (no. frames distributions calculated over) . (3.55) 


For 230108.FO1, this is 4x50 (20—1) = 3800, and for 180608.HS, this is 4x 50x (19-1) = 3600. 
Note that this is not quite the number of distinct instances of each type of distribution however, 
since the particles were all in the same state in the initial frame. We have not checked the exact 
number of distinct instances, but it should be about 3800 —49 for 230108.FO1 and 3600 —49 for 
180608.HS, since the probability of the event of two particles ending up in the exact same state 
after the initial frame (without resampling) may, for most practical purposes, be considered 
to be 0. There were less displacement sampling distributions however, due to caching. We 


calculated 2354 distinct displacement distributions for 230108.FO1, and 2610 for 180608.HS. 


To justify the assumptions we made about the behaviour of the partial means, we considered 
two sets of empirical bivariate distributions constructed from the energy data for each video 
sequence. For each state x of a sampling distribution f, let n(x) be the number of defined 
energy terms that the subregions contribute to f (x). For each f and x, we uniformly sampled 


a random permutation Pin s(x) OVER the subregions using the algorithm described in [31], and 
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Re, wie 


230108.FO1, frame 4 (c) 230108.FO1, frame 8 (d) 230108.FO1, frame 12(e) 230108.FO1, frame 16 


» 


(f) 180608.HS, frame 37 (g) 180608.HS, frame 41 (h) 180608.HS, frame 45 


(i) 180608.HS, frame 49 (j) 180608.HS, frame 53 


Figure 3.9: 5 frames from the 20/19 frames over which we tracked four patches in 230108.FO1/180608.HS. Each rendered patch is the 
weighted average of its particles. The green squares denote the manually-defined initial positions of the patches. The (mostly blue) clouds 
of crosses around the centres of each patch denote the positions of the centres of the particles. 


3.4. Results: Exponential Likelihood Ratio Growth Study 1 


we calculated all n 7() partial means*. In a slight change to the previous section’s notation, we 
will now use 7, to refer to the partial mean at state x of sampling distribution f, calculated 
using the first 7 defined energies contributed by subregions chosen in the order prescribed by 
Pin (a) For the most part, we will omit q,,’s dependence on f, to avoid notational clutter. 


But in cases where it is necessary to make this dependence explicit, we will write Pui 


The first set of empirical distributions we considered consisted of a distribution for each type 


of importance sampling distribution over the pairs 


} Wx, j — Wen la 
( : ae aa A 1 : (3.56) 


n;(x) "pet Van ja) 


for each x and f of the given type, and all i € vgilnene ,2;(2)}- This set allows us to examine 
the validity of the assumption that the partial means are likely to converge as the number of 
energy terms increases, by providing data about how great the absolute relative error (ARE) 
in the estimates of the n (2) partial mean may still be when no less than 7 energy terms are 


used. 


The histograms that resulted from calculating these pairs from the partial means for the dis- 
placement, rotation, scaling and skewing sampling distributions for both test videos are shown 
in subfigures (a), (c), (i) and (k) of figures 3.11 and 3.12. Beneath each subfigure, we have 
plotted conditional histogram percentiles, each of which was defined by restricting the pairs to 


those for which me fell within a specific bin (for all tests in this section, we partitioned the 
f 


range of =O) — the unit interval [0,1] — into 100 bins). Let 


i7(x, u) 4 [n7(x)u] (3:57) 


*For each state x, given the condition that exactly n 7(2) —i energy terms are undefined, the i” partial mean 
(which we take as an approximation of the n;(x)*") is a random variable that fluctuates according to which i 
of the n 7 (2) subregions contribute the defined energy terms. The most accurate way to simulate the process of 
drawing samples from the i‘ partial mean’s distribution would involve repeatedly drawing random samples of 
plausible occlusion masks from some appropriate distribution until exactly n 7 (2) —i energy terms are undefined 
for state z. As a simpler and significantly more efficient proxy for this process, we take the most conservative 
approach of assuming all possible configurations in which exactly i subregions contribute defined energy terms 
to be equally probable. 
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be the minimum number of energy terms that must have contributed towards a partial mean 
at state x of sampling distribution f when the proportion of used energy terms was at least 
u € [0,1]. The passing of the 100p"" percentile curve through a point (u,v) indicates that v is 
the supremum, over 100p% of the upper partial mean subsequences Pres plerpa)-n gle) of the AREs 


between each subsequence term 7,,. and Pom sla) 


As expected, the percentiles show that the proportion of data samples with AREs greater than 
any given value decreases as the minimum number of energy terms used to calculate the partial 
means tends to n;(2). For 230108.FO1, only 10% of the upper partial mean subsequences for 
the displacement distributions had AREs greater than 0.1 when 0.3n;(z) energy terms were 
used. When 0.56n 7 (2) energy terms were used, only 1% exceeded an ARE of 0.1, and 90% had 
AREs < 0.06. The convergence was faster for the scaling distributions, for which 0.51n;(z) 
energy terms were sufficient to bring 99% of the partial means within an ARE of 0.1, and faster 
still for the rotation and skewing distributions, for which 0.44n ;(x) and 0.41n;(x) energy terms 


sufficed respectively. The results for 180608.HS were similar. 


The fact that the percentiles did not converge at similar rates across the four types of sampling 
distribution may be a result of the large differences in the sizes of the domains of each of these 
sampling distributions, and hence in the number of partial means that each type of sampling 
distribution contributed to the data sets. All of the bounded-displacement energy functions, 
and hence the displacement sampling distributions, were sampled at (af + 1)? = 1681 (under 
the parameter values in table 3.1) points. The exact numbers of points at which the other three 
sampling distribution types were sampled varied, due to the automatic selection of the sample 
spacing that we describe in appendix B.3, but the numbers were 1 order of magnitude less than 


1681 for the scaling distributions, and 2 orders less for the skewing and rotation distributions. 


For each sampling distribution f, let ws denote the state of minimal n(x) partial mean, i.e.: 


La 


arg min Pat slo" . (3.58) 


gz! 
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The second set of empirical distributions we considered consisted of four distributions (again, 


one for each type of importance sampling distribution) over the pairs 


5 (3.59) 


for alla # ws in each f’s domain, and i € {1,... ,min(n ;(x),n7(2%))}- The construction of 
this set provides evidence for the correctness of eq. (3.49) when comparing the partial mean 
energies of each sampling distribution f’s states to those of zk, as the second element of each 
pair is the relative margin that separates the partial means at we from those at another state x 
of f’s domain, when the partial means at both states use at least i energy terms. The relative 


margin is defined as 
/ 


a 
———— 3.60 
mas Ds | ie 

f? 
where a’ is the greatest value for which 
“ =: 2 
Wx,a — Pob.e >a’, V(ced)eE {i, éy005 (n(x), nj(o4)) } (3.61) 


(see figure 3.10). When a’ > 0, it can be taken as an estimate of the a variable used in 
eq. (3.44) that defined the minimum rate of exponential growth in sampling probability ratios. 
When a’ < 0, it suggests that either 7 is not a large enough number of energy terms to overcome 
the fluctuations in the ordering of the partial means at states 7 and we, or that the ratio between 


the sampling probabilities at these states will not grow exponentially. 


The histograms for this set of empirical distributions are shown in subfigures (b), (d), (j) and 
(1) of figures 3.11 and 3.12, once again with the conditional histogram percentile curves plotted 
beneath them. This time, the passing of the 100p"" percentile curve through a point (u,v) 
indicates that for 100(1 — p)% of the upper partial mean subsequences Pans plerps):mg(2) ce xh, 
the infimum on the relative margins for which eq. (3.61) holds is v. The value of v should 
be > 0 for most subsequences when wu is sufficiently high, since, under the evidence for the 
convergence of partial means provided by the first set of empirical distributions, there should 


be little variation in the partial means that use at least 2 7(@, u) energy terms. The larger the 
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Port nz(a) Yan -(x) 
| 
buecteesee| oy! brerereco——§_| 


——— 


Figure 3.10: An illustration of the margin a’ that separates the partial means Wein from the 


partial means w, ,.,, for i = i;(x,u) (for some arbitrary uv) and n = min(n;(z), nj(we)). The 


latter partial means are indicated by the green markers on the left (the dot marks the n 7(a5)""), 
and the former are indicated by the red markers on the right. The error bars on the two groups 
indicate the regions about the dots within which the other partial means must lie, as dictated 
by the 100‘ percentile curve of the ARE histograms. Increasing the value of u decreases the 
width of these bounds, and hence increases the margin a’. 


proportion (1—p) of subsequences for which v > 0, the stronger the evidence is for a restricted 
version of eq. (3.49), in which the constraint m,,mj, < min(n;(2), nj(az)) is imposed on the 


consequent, and a = we, b= so; and 7 = te(x, u) —1. 


The percentile curves confirm that the relative margin generally does increase as u tends to 
1. Reading the curves from right to left gives an empirical estimate of the rate at which the 
probability of the exponential sampling ratio growth hypothesis being true decreases with u, 
when comparing states to the state of minimal mean energy. E.g. for 230108.FO1’s rotation 
distributions, 90% of the relative margins were non-negative when the partial means were 
defined with at least 0.9n ;(2) energy terms, and this fell to 75% with at least 0.56n ;(x) energy 


terms, and 50% with at least 0.18n;(x) energy terms. 


In both videos, the percentile curves for the rotation, scaling and skewing distributions grew 
more slowly and covered narrower ranges than the percentile curves for displacement. E.g. for 
the displacement distributions for the 230108.FO1 video, 99% of the partial means using at 
least 0.16n;(x) energy terms had non-negative relative margins, and 907% had relative margins 
> 0.2 when at least 0.3n;(x) energy terms were used. In comparison, the scaling distributions 
required at least 0.6n 7(2) energy terms to be used in order for 90% of the partial means to 
have non-negative relative margins, and the rotation and skewing distributions required at least 
0.9n;(x) and 0.94n ;(x) energy terms, respectively. Again, these differences may be due to the 


large differences in the sizes of the domains of the different types of sampling distribution, which 
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meant that the domain of the displacement distributions had a higher proportion of states far 


away from we than the domains of the other types of distribution. 


3.5 Conclusion 


We have introduced a new importance sampling method inspired by Hinton’s Product-of- 
Experts model. By multiplicatively combining likelihood information from multiple subregions, 
our method constructs distributions from which the displacement, rotation, skewing and scaling 
components of a particle’s state can be sampled. We have empirically validated the hypothesis 
that, relative to the mode, the sampling probabilities of the vast majority of non-modal states 
in these distributions grow exponentially with the number of defined energy terms that the 
subregions contribute, and we have devised a simple method for handling missing likelihood 


data based on this hypothesis. 
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Figure 3.11: (a), (c), (i), (k) Histograms of the joint distributions of the pairs from eq. (3.56), 


for the 230108.FO1 video. In all subfigures, the x axis denotes values of 


been partitioned into 3000 bins, and the x axes into 100; we do this with all histograms in the 


rest of this section. The colour key follows a logarithmic scale, and no samples fell into the 


black regions. (b), (d), (j), (l) Histograms of the joint distributions of the pairs from eq. (3.59). 


lies in a specific bin. 
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(e)-(h), (m)-(p) The 1%, 10%, 25%, 50, 75%", 90% and 99 percentiles of the histograms above 


them, under the condition that 
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Figure 3.12: The figures here have a similar interpretation to those in figure 3.11, except that 


these figures are for the 180608.HS video. 


Chapter 4 


Likelihood Evaluation 


4.1 Synopsis 


In the previous chapter, we treated the function Ay, that defines the local likelihood energies, 
as a black box, and we did not touch on the form of the patch likelihood function L;. For the 
particle filter to be successful, we must define these functions in a way that makes them invariant 
to changes in the appearance of a patch that are not caused by myocardial deformations, 
and insensitive to appearance changes caused by the non-affine components of myocardial 


deformations (since our importance sampler currently only hypothesises affine deformations). 


The most significant non-deformational appearance changes that we must take into account 
are those caused by specular reflections and illumination changes. We handle these by carrying 
out two different image processing operations. For the specular highlights, we mask out and 
ignore regions of high intensity and low saturation. We remove the effects of illumination 
changes by fitting a parametric surface with few degrees of freedom to the lightness channel of 
an LUV colour space representation of the image, and then subtracting the surface from the 
channel. This has the effect of preserving fine textural details while attenuating lower-frequency 


shading-related lightness variations. 


After applying these operations, we then compare patches based on the part of the (weighted) 


82 
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(a) (b) 


Figure 4.1: Three images showing the kinds of changes in the appearance of an image region 
that the dissimilarity functions must be able to cope with. (a) The original region. (b) An 
illumination change. (c) Occlusion by specular highlights. 


distribution of squared differences between corresponding pixels that lies below a certain per- 
centile. By discarding the part of the distribution that lies above the percentile, we hope to 
introduce a degree of insensitivity to the non-affine component of the deformations, and to 


provide extra resilience against specular highlights that are not detected by the mask. 


4.2 Robust Image Region Comparison 


As with ¢;, we define L; in terms of a non-negative dissimilarity function Ar;,.1[V (4s), V(02)], 
that compares the appearance of patch V(6,) in image Z, to the appearance of patch V(0;) in 


image Z;, and we model [; as a stretched exponential function of it: 
Li (Oz; Go, Zing) « Y(Azot[V (Go), V(Oz)]; az, V2) (4.1) 


for hyperparameters a, and yz, that we will deal with the selection of in chapter 6 (the con- 
stant of proportionality is unimportant for now, as it disappears when the particle weights are 


normalised). For now we will just focus on the forms of the dissimilarity functions. 


Although myocardial deformations are a significant cause of changes in the appearance of 


patches, they are certainly not the only kinds. Any model of the dissimilarity functions that 
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define the local likelihood functions and the patch likelihood function must address the same 
question: given the hypothesised trajectory of the patch up till time t, what is the probability 
that (some region of) image Z; looks the way it does? In answering this question, the dissim- 
ilarity functions must take into account the kinds of non-deformational appearance changes 
illustrated in figure 4.1, namely: changes in illumination/shading and partial or total occlusion 
by specular highlights (and possibly other objects, such as surgical tools). To cope with these 


phenomena, we define the dissimilarity functions in terms of two image filters: 


e A binary occlusion mask Fo;(x), that attempts to label each pixel x that lies in an 


occluded region of Z; with a value 1, and all other pixels with a value 0. 


e A lightness-normalising filter Fr ,(a), that attempts to produce a representation of each 


pixel a of Z; that is invariant to changes in shading and illumination. 


Given these filters, we define both Aw and A; in terms of a function €,, that measures the 
dissimilarity between regions X, and X; of Z, and Z; respectively, where the correspondence 
between points of X, and points of X; is determined by the hypothesised transformation. In 
particular, given a central reference point Z, in X, and a radial basis function w(r) that we use 
to weight the contribution of each x € X,, we want €? to be a single number that summarises 
the weighted distribution of the squared Euclidean differences d?, between all points « € X, 
and a’ € X;: 


d3,(w,2’) © ||Fe,o(@) — Foal’) , (4.2) 
where each (x, x’) pair is associated with a weight w% ;: 


wi, (a, x”) & (1 = Fo,,(x)) (1 = Fo .(x’)) Ws (x) (4.3) 


ws(z) Sw (le — ||). (4.4) 


The most obvious choice of €? would be the weighted mean of cae evaluated over all pairs of 
corresponding points. However, it is well-known that the mean is sensitive to outliers, which 


can cause problems when Fo is not able to mask out every pixel that has a strong specular 
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component, or when a significant minority of points in X, near boundaries (such as vessel 
boundaries) look very different to the corresponding points in X; due, not to the importance 
sampler hypothesising an incorrect state, but to the simplifying assumptions we make about 
the deformations of the patches and the local likelihood subregions, which mean that the best- 
fitting states will almost never perfectly register these regions. A more robust choice would be 
to consider the 100p" percentile of the weighted distribution (e.g. the median, with p = 0.5). 
But a single percentile does not capture much information about the part of the distribution 
below it. An important feature of the distribution to consider is how close all of the mass 
below the 100p"" percentile is to 0, because a distribution with a large weighted proportion of 
sub-percentile d?, values close to 0 should give a smaller value of €? than a distribution with 


the same 100p percentile that attaches less weight to low values. 


Let d? be a random variable defined by sampling a pair of corresponding points x € X, and 
x’ € X;, with probability proportional to w4, ,(x,x’), and setting d? = d?,(x,x’). Also, let d}? refer 
to the 100p™ percentile of d?’s distribution. €? should return 0 if and only if the conditional 
random variable 


12 * 
dard ded (4.5) 


follows the Dirac delta distribution d(d), i.e. if d’ = 0 always. So a natural way to incorpo- 
rate information into €? about how far d’’s distribution is from 0 is to measure the cost of 


transforming it into 0. 


A useful measure of this cost is the Earth Mover’s Distance (EMD, also known as the 1* 
Mallow’s or Wasserstein distance). This function is a metric on the space of n-variate probability 
distributions that can be thought of as a measure of the minimum amount of energy involved 
in the process of turning one distribution into the other by “pushing” mass around, where the 
energy needed to move mass m over distance d is md (the name comes from the idea that the 
PDFs are like piles of earth). It has been used extensively in image retrieval, where its measure 
of the difference between histograms or more compact signatures of image regions often proves 
to be closer to the perceptual difference than the measure given by pointwise functions like L? 


metrics and the Kullback-Leibler divergence (e.g. see [89]), although pointwise correspondence 
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is not typically assumed in such applications. 


For continuous univariate random variables X and Y with CDFs F(x) and G(y) respectively, 


[64] states that the EMD o[F, G] satisfies 


oF G)= [ |F(x) —G(a)| dx . (4.6) 


(oe) 


The CDF of 6(-) is given by the Heaviside step function: 


2 0 ,x#<0 
Hoa) © f° 6ty) dy = : (4.7) 


se : 
1, otherwise. 


So if d’”’s CDF is HT, then by Riemann-Stieltjes integration and a change of integration order: 


P 


dt? pdx? 

= i / dH (y) dx 
dt? py 

- | fi dx dH; (y) 
0 0 


di? 
= ’; y dH*(y) 
0 


= Ed”) 


olH*, Hs] -[- 1 — H*(x) dx (4.8) 


= Eld’ | d’ < dy?) . (4.9) 


In other words, the EMD between 6 and d’”’s distribution is equal to all of the following: the 
area of the region above H> and below the line y = 1; d’”’s mean; the conditional mean of d?, 
under the condition that it lies below the 100p™" percentile. So in the special case where p = 1, 


o|H;., Hs] is d°’s mean, and in general it lies in the interval [0, d?”]. 
We define an idealisation €* of € as 


: ft (xx’) a(x,x") 
, a a olHy, Hs] , [ws(a) dae ZT 
Ezt(Xs, Xt5W,p) = (4.10) 


tL , otherwise , 
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d?! q?t 


Figure 4.2: This figure illustrates the relationship between our definition of h (depicted by 
the blue horizontal line segments), the bin boundaries (filled circles indicate closed bounds, 
unfilled represent open bounds), and the CDF H, (the green dot-dashed line segments). In this 
example, the mass of h over each bin and the gradient of H, over each bin decrease by a factor 
of half from left to right. We approximate o[H>, Hs] as © o|Hp, Hs], which is the area above 
H, and below the line y = 1. 


where: the integral in the numerator is taken over all pairs of corresponding points in X, 
and X;; the integral in the denominator is taken over all points in X,; the scaling constant c 
ensures that €*’s image is the unit interval [0,1] (its value is the maximum possible Euclidean 
distance between pixel values, and so it only depends on the colour model that F¢ uses); and 
the threshold 7 determines the maximum total weight that pixel pairs with an occluded pixel 
can have without € failing and returning L, indicating that it is undefined. In all of the tests 
presented in this thesis, we use T = 0.5. Note that the square root does not necessarily have to 


be taken, as it can be incorporated into the value of y’s a hyperparameter. 


We cannot calculate o[H>, Hs] exactly, as HT, and cc cannot be evaluated exactly in most 
practical cases. So instead, we consider an indexed subset «,(7) of X;, with corresponding 
points a;(7) € X;, and we define the squared differences d?(i) and weights w'(7) of these pairs 


as 


ow) =e, elt) ea) (4.11) 


(4.12) 


We then estimate H} and d}* by constructing an N bin histogram h over the observed d?(i). 
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Let 
ds min d?(i) 
gts max d?(i) (4.13) 
We define the width W of h’s bins as 
wal (4.14) 
and the boundary b(j) of the j** bin of h as 


, | e*+G-YW.d*+5W) ,1<j<N 
b= ; (4.15) 
[a*t — W, a") j=N 


We then construct h by adding w’(z) to bin max @ )-a-*)) for each 7, and normalising. 


Assuming the probability density of d? over each bin b(j) to be uniform and proportional to 
h(j), our estimate d? of d*? is given by 


2 2), Pp dpa h®) 
de = d* 4 G hik +1) jw 


(4.16) 
where k is the largest integer such that 


j=0 


DMG) <p < DIAG) - (4.17) 


We define h(0) = 0 and treat h(N +1) as any finite value > 0 (that we will never need to use), 
so that k = 0 for p< h(1) and k =N for p=1. 


Our estimate of H; is given by the piecewise differentiable conditional CDF H, of our piecewise 
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uniform interpretation of h: 


1 (SO! ai) + (@) — LB) ACLU) +)), Pe 
H,(d°) = 40 P< 
1 ed ds 


2 72) 
u(d?) £ ees (4.18) 


H(j) 4 =e h(i) , (4.19) 


and noting that the area above H,, is just the sum of the areas of trapeziums (as illustrated in 


figure 4.2), we approximate o[H>, Hs] as follows: 


ol, As] oa olHy, Hs] 


dp 
= | 1 — A, (x) dx 
0 


= qt 1— H,(2) dx ° 1 — H,(2x) dx 
% 2, ie ( 3 Vea, ( ) 
4 OS (= HG) += AG) + PETES a - aw) 
dy + a (2% eG a) ya “aes ) (4.20) 


and we approximate the integrals in the condition of eq. (4.10) as 


ful (ea) dee’) Swi 
fu,(a) de S,w,(w.(8)) (4.21) 


Function CalcDeltaEMD and its helper function, given in appendix C, describe an algorithm 
for calculating o[H,, Hs]. It requires |a,| iterations to calculate d?(-), d?* and d?", an additional 


|a,| iterations to calculate h (assuming that the time taken to initialise it to 0 is negligible), 
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and, in the worst case, an additional N iterations to calculate o[H,, Hs] (note that we do not 
need to explicitly normalise h). The algorithm is no slower than the fastest algorithms for 
calculating d> (which must either sort the values first in O(|a.| log |a.|) time and select d? in 
O(|as|) time, or construct h and use it to approximate d>), and only requires at most |a,| + N 


more iterations than calculating the weighted mean. 


Under normal operation, we define Ay [S(g), 5(g)+d] in terms of the region of Fz ,_; contained 
within subregion S(g) and the region of Fz, contained within the displaced subregion S(g) +d 
as 


ls 
AvlS(g), 5(g) + d] = E-a1 (S(g), (9) +N (50,5) .p) (4.22) 
for some p, assuming an arbitrary, but consistent, ordering over the subregions’ pixels (e.g. a 
row-major ordering). We weight the pixels with the Gaussian radial basis function 


2 
N(r;0,0) « exp {-aa} (4.23) 
oO 


(the constant of proportionality does not matter, because it disappears when the histogram 
h is normalised) to compensate for the fact that by uniformly displacing S(g)’s pixels by d 
rather than transforming them exactly as prescribed by the hypothesised transformations, we 
introduce errors that will generally increase with each pixel’s distance from z,_,. As mentioned 
in the previous chapter, it is sometimes useful to compare F¢ ; to earlier images; we will discuss 
this in section 6.5. Note that since both the subregion radius r, and the positions of S(g)’s 
pixels relative to their mean are fixed, the Gaussian weights w; will never change, so they can 


be precomputed and cached for later use. 


For Az o1[V (Qo), V(O:)], we define a as the (arbitrarily ordered) list of pixels within the 
boundary of V(@9), and we calculate their corresponding bilinear interpolation coefficients 


a(i) and B(i). We always define V as a square and Qo = 0 as a translation, so 
a(i) = 


Bi) = 


Lo,r(4) — vt 
—v 


J 
y 


a . (4.24) 
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where v‘ is V(Qo)’s bottom-left vertex and s is the length of V(Qp)’s sides. Taking V’s vertices 
V1,...,U4 to be defined in anticlockwise order starting from the bottom-left vertex, we define 


zx, in terms of the interpolation coefficients: 


a(t) ={(1 — a(t))M(v4; Or) + a(t) M (v3; Ox) } BC) + 


{(1 — a(t))M(v1; Or) + a(t) M (v2; Or)} (1 — B(%)) , (4.25) 
and Ay as 
Axo4[V (0), V(8)] = Eo. (ao, 2 N (. 0, =) .P) ; (4.26) 


for some p, again using a Gaussian radial basis function to account for the fact that the errors 
in V(©,) under the best-fitting ©, will generally increase towards the patch’s boundary (as 
before, the Gaussian weights do not change, so they can be precomputed and cached). The 
ordinates of a,’s vertices will often be non-integral, so we calculate points in F¢ by bilinearly 


interpolating: 


Fela) = Fe(|#o], ty] )(L@e] +1 — @2)(Ley] + 1 — aty)+ 


Fc({@e], |ty|)(@e — [#e])(ley] +1 — ay)+ 


Fe(|®xJ,[@y])(L@2] +1 — &2) (ay — laty])+ 


Fic([@a|, [a@y])(@2 — [@e])(@y — |ay]) - (4.27) 


Just as we needed a way of handling undefined values of Ay, which we presented in the previous 
chapter, we also need a way of handling undefined values of A;. Let Jy be the set of particle 
indices such that Az o4[V (Qo), vel) is defined for each 7 € Jy, and let J, be the indices of 
the particles for which it is undefined. Without at least looking for relevant image information 
in earlier frames, we cannot assume that el | ig more likely than el 1 for any j,j’ € Jy. So we 
once again take the most conservative approach, this time by maximising the entropy of the 


particle distribution (ef), wt!) by setting L,(e!) to a constant e~”" for each j € J,, where, as 


92 Chapter 4. Likelihood Evaluation 


we show in appendix B.5, the value of w, given by 


b 
Wy = max € = *.0) 


c= S- wl Mn (w'! ‘) b= S- wi hn, (OP : In (wi iro? ') 


ESa J ETu 
c= owl d= SF wlip(e?") 
eSa J ETu 


(4.28) 


maximises the entropy, assuming the form of the unnormalised weight update equation given 
in eq. (3.16). Note that this result assumes that the maximum likelihood that DL, can assign 
to a particle is 1. If this is not the case, then the unnormalised weights w’ 1 should be divided 
by the maximum likelihood before calculating w,. Of course, this rescaling has no effect once 


the weights are normalised. 


All that remains now is for us to discuss the construction of the filters Fr; and Fo. We will do 


this in the following sections. 


4.3 Lightness Normalisation Filtering 


In analysing and removing the effects of shading and illumination changes on an image, it is vital 
for us to be able to separate information about the brightness of each pixel from information 
about its colour. Furthermore, given our use of the Euclidean difference to compare pairs of 
pixels, it would also be ideal to process the images in a perceptually uniform colour space, i.e. 
a space in which pairs of pixels that we perceive to be equally dissimilar have equal Euclidean 
differences. The CIE LAB and CIE LUV colour spaces [93] provide adequate solutions to these 
issues. Both colour spaces approximate perceptual uniformity over three channels, one of which 
is a lightness channel, L*, that measures the perceived brightness of a pixel. So before defining 
Fc, we convert the input image Z, into either of these spaces (the conversion processes are 


described in [93]). We then carry out the remaining steps of removing shading and illumination 
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effects on the L* channel alone. 


4.3.1 DTlumination Model 


The graphics literature is replete with lighting models of varying degrees of physical accuracy. 
A common feature amongst many of them is the modelling of the radiance of the light reflected 
from a surface as a linear combination of diffuse radiance terms and specular radiance terms. 
Shafer describes the reflection of these distinct forms of light from a number of materials in [97], 
stating that specular reflection occurs when light incident on an object is perfectly reflected 
about the surface normal at the interface between the object’s body and the air, as a result of 
the differing refractive indices of the object and the air, whilst diffuse reflection is a result of 
the incident light penetrating the interface and interacting with colourants within the object’s 
body before being selectively absorbed/retransmitted (this process is illustrated in figure 4.3a). 
The Dichromatic Reflection Model that he proposes in that paper is a useful reference for the 
assumptions we will make in defining F-,, because it has a simple mathematical statement, 
and yet it is quite general in the sense that it subsumes lighting models such as the Phong 
model [79], and can explain more physically accurate models such as the Blinn-Phong model 
[13] and Cook-Torrance model [21] up to Fresnel effects* (the physical accuracy of these and 
other models was studied in [72], where the Cook-Torrance model was shown to be one of the 


most accurate models under investigation). 


The illumination model we assume is a modification of the Dichromatic Reflection Model. Given 
an arbitrary parameterisation of the myocardial surface under which we represent each point 
with the parameter vector p, we express the total radiance r; reflected from the 3D surface 
point that p corresponds to at time t as a function of: wavelength A; the angle ¢; between the 


surface normal f at the point and the direction @ of the incident light; the angle ¢, between ft 


*By “Fresnel effects” we mean reflection governed by the laws of the reflection and transmission of electro- 
magnetic waves at an interface set out by the physicist Augustin-Jean Fresnel. In particular, these equations 
describe an interdependence in the effects of wavelength and angle of incidence on reflectance. The Dichromatic 
Reflection Model makes the simplifying assumption that wavelength and angle of incidence have independent 
effects on reflectance, but Shafer estimates the error introduced by this assumption to be small when the angle 
of incidence is no greater than 60°, which it is in most parts of the intraoperative videos in our application. 
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and the viewing direction 6; and the angle @, between i and 6 (see figure 4.3b); as follows: 


re(A, Pi, Oy dg; P) = ms(di, dy, bg) ®s(A) re ma(di, Qu, bg) PalA, p) ’ (4.29) 


where ®, and ®, are the spectral power distributions (SPDs) of the specular and diffuse com- 
ponents respectively, and m, and mg are scaling factors that only depend on the angles. Note 
that the photometric angles should all be considered to be functions of p and t, but we suppress 


this to avoid notational clutter. 


The most significant modification we have made is to allow ®, to vary spatially (in the original 
model, it was assumed to be fixed everywhere up to a scaling factor). Permitting spatial 
variations allows the albedo of the surface (an intrinsic surface property that controls the amount 
of incident light diffusely reflected from each surface point at each wavelength, and that hence 
determines the surface’s texture) to be non-uniform. Three convenient assumptions from the 


original model that we do assume, however, are that: 
1. there is a single light source (this is true in our case, and in fact 1 = 6, since the light 
source is fixed to the endoscope), 


2. the relative SPD (i.e. the distribution that results from scaling the SPD so that it 


integrates to 1) of the incident light is spatiotemporally invariant, 


3. there are no interreflections between surfaces. 


From these assumptions, it follows that 


Oq(A, Pp) = P(A, p)®i(A)a(p) , (4.30) 


where: p(A, p) € [0,1] is the albedo at point p; ®; is the relative SPD of the light source; and 
c:(p) is a spatiotemporally varying scaling factor that accounts for the total radiance of the 
light over all wavelengths, shadowing and any attenuation in light intensity that may occur due 


to the inverse square law. Since the albedo is a time-invariant intrinsic surface property, an 
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specular direction intertace reflection incident light 


body reflection 


Figure 4.3: (a) The interactions between light and an inhomogeneous material. (b) The pho- 
tometric angles used in the Dichromatic Reflection Model. Image (a) taken, and image (b) 
adapted, from [97]. 


estimate of it or of some time-invariant function of the triplet (A, p,p(A, p)) (we will explain 


what we mean by this in section 4.3.3) would be an ideal representation of L* in the absence 


of illumination effects. 


In addition to these three assumptions, we also assume that the camera’s exposure time is 


negligibly small, and hence that there is no significant motion blur. 


4.3.2 Related Work 


The estimation and/or removal of illumination effects is an important component of facial 
recognition /tracking algorithms, and a number of methods for solving the problem have been 
proposed in the literature. One group of related approaches replaces the task of estimating 
the parameters of a physics-inspired lighting model with that of estimating the coefficients of 
a linear model. For instance, [46] proposed a PCA-based approach in which the appearance of 
faces under arbitrary lighting conditions is modelled as a linear combination of “eigenfaces” . 
Similarly, [98] later showed that under perfect Lambertian reflection (for which ®, = 0 and 
ma(i, bv; g) = Cos ¢;) without shadows, three images of a surface taken from a fixed viewpoint, 
each with linearly independent light source directions, suffice as a linear basis that can be 
used to synthesise images from the same viewpoint under novel lighting conditions. [45] built 


on this result, asserting that the effects of shadowing can be approximately incorporated by 
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constructing a higher-dimensional linear basis. They demonstrated the effectiveness of this idea 
by incorporating the estimation of the basis coefficients into an optimisation-based tracker that 


calculates both illumination changes and affine shape changes in videos of human faces. 


Unfortunately, all of these methods require training sets containing images of the object of 
interest under different lighting conditions, and so they would only be practical for the task 
of tracking arbitrary myocardial patches if we had a way of synthesising the appearance of 
the patches under randomly selected viewing conditions. This would only be possible if depth 
information was readily available for each image. If depth information was available however, 
we could define the camera to be at the origin and define 1 + 6 = (0,0,1)", and then calculate 
the surface normals at each point relative to this coordinate system with little difficulty. Under 
an appropriate model of mg and mg, this information would significantly simplify the problem 


of estimating the albedo. 


Depth information could potentially be estimated for each image using shape-from-shading 
techniques. One such method, developed for an endoscopic stomach surgery application, was 
presented in [76]. It replaces many of the assumptions commonly made in shape-from-shading 
algorithms (a distant light source, orthographic projection and Lambertian reflection) with as- 
sumptions that are more appropriate for endoscopic images, such as ours. Unfortunately, it 
does not relax the assumption of a uniform albedo, and so it is not clear how well it would 
perform on our data. Shape-from-shading algorithms have been formulated under the assump- 
tion of non-uniform albedo however, and one such algorithm is given in [86], where, under 
the assumption that the surface can be locally approximated by planar patches, the authors 
express the problem in the form of an albedo-independent linear relationship between the im- 
age gradients (normalised by image intensity) and the patch’s normal. We may give further 
attention to shape-from-shading techniques in future developments if more significant uses for 
depth information present themselves, but we felt that the present problem could be solved 


with simpler, more well-behaved image processing techniques. 


One final common solution to the problems caused by illumination changes that we should 


mention is that of circumventing the problem altogether by working with image edges. Isard 
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and Blake’s particle filtering work was based on the idea of measuring the distances from image 
edges to the edges of a 2D line model [52], and more recent work has extended the idea to using 
3D line models to track rigid 3D camera motions [83, 59]. The problem with using edges is that 
they discard a large amount of information about subtle textural variations in the interior of 
the segments they bound. Typical intraoperative myocardial images tend to be much richer in 
textural detail than they are in strong edges, so it is preferable to use an illumination change 


solution that preserves as much textural information as possible. 


4.3.3 Local Lightness Normalisation 


The fact that the myocardium is shiny means that the specular reflection term m, generally 
produces specular highlights with sharp boundaries, beyond which the diffuse component is the 
dominant term of r. So for any point p not within or near the boundary of a specular highlight, 


we assume 


ri(A, Qi, Qu; Pq: P) x mali, Qu, bg) PA, p) ®i(A)a(p) : (4.31) 


Let C[x(-)| be the functional that maps an SPD x(-) to the corresponding value of the L* 
channel. As shown in [93], C is a strictly-increasing function of the CIE XYZ space’s Y channel 
(the channel that represents luminance), which in turn is defined as a linear combination of 
x and a fixed colour matching function (CMF). The mapping from Y to L* is continuous 
everywhere and smooth at all but one point, at which a change is made from a linear model 
of our perception of dim light to a cubic root model of our perception of brighter light. Given 
this and the fact that for smooth surfaces like the myocardium, the diffuse scaling factor mg 
is generally well modelled as a smoothly varying function of the photometric angles, it seems 
reasonable to model the local variations that mg induces on regions of the L* channel that 
correspond to a single surface using a parametric surface with a small number of degrees of 
freedom. By this we mean that if p(-,p) and ¢(p) were spatially invariant, the mapping 
qt> Clri(-, i, $v; 0g, P)], where q denotes the image coordinate that corresponds to p, would 


locally closely match some simple parametric model over the image coordinates gq. Noting that 
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any light intensity attenuation due to the inverse square law will also induce a smooth effect 
over regions of the L* channel that correspond to a single surface, if we additionally assume 
that the boundaries of the shadows in each input image decay smoothly, then we can go one 
step further and use a simple parametric surface to model the local variations that mac induces 


on L*. 


Let: Li, denote the time ¢ input L* channel; Q; be a region of the channel corresponding 
to a single surface, over which m, is negligible; S"(q € Q:; 4) be a parametric surface that 
approximates the effect that mac; induces over this region of Li,, where pw is Ss parameter 
vector; and P; be the set of myocardial surface parameters that project into Q; without being 
occluded. We will use p € P, to denote the surface parameter that maps to whatever q € Q; 


we happen to be talking about, notationally omitting p’s dependence on q for clarity. 


As L* is approximately perceptually uniform, we use an additive expression to approximately 


map the relationship of eq. (4.31) to L*: 


TQ © Qi) © S'(qim) + e'(q) , (4.32) 


for some function p'(q) that represents the effects that p induces on L,,. Noting that [93] gives 
the following expression for a function C’(Y) that, up to a strictly-increasing linear function, 


maps the Y channel of CIE XYZ space to L*: 


C! an VY 1 a (34)° 
(VY) = (4.33) 


841 16 
os’ aie Obnerwise:., 


it follows that, up to the errors in $", eq. (4.32) is equivalent to approximating C’(Y) with a 


strictly-increasing linear function of log(Y), since 


y= / r(, bi, buy bq, P)w(A) dr 


& Madi, bv, bg) (P) Le, [w(A)p, p)] 


= log(Y) © log(ma( $i, Gv, Gg )se(p)) + log(Ee,[wA)eQ, P)]) (4.34) 
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Figure 4.4: (a) A comparison of C’(Y) and alog(Y)+ 6. (b) A plot of the absolute relative 
I aoa 


error 


where w(A) is the CMF for channel Y, so that for some variables a and 6, and arbitrary 


a € (0, 1]: 


S'(q; }) x alog(ma(di, Qu; bg)st(p)) + ab 


p'(q) = alog(Eo,w(A)p(A,p)]) + (1 a)b (4.35) 


The benefit of this logarithmic approximation and our earlier assumption of ®;’s spatiotemporal 
invariance is that p’ only varies with the albedo p, which simplifies the task of separating 
p’s effects from those of mgq in L* space. Figure 4.4 shows the errors introduced by this 
approximation over Y € {s53, 3) ..., 1}, using the least-squares estimate of a and b. For 
Y > 0.05, the absolute relative error does not exceed 0.081, which seems acceptably small given 
all of our other approximations. For Y < 0.05, the error approaches infinity as Y approaches 
0, but such dark pixels do not generally occur in our application, so this does not particularly 


matter™. 


We define a lightness normalising filter to be any procedure that estimates ps for each Q;, so 


that F¢,’s L* channel, which we will refer to as Lg, (“O” for “Output”), can be computed 


with the approximately photometric-angle-invariant expression 


old € Qs) = L1,(@) — S"(q; we) © p'(a) . (4.36) 


“Note that when the first case of eq. (4.33) holds, we could equate log(L{ ,) with the RHS of eq. (4.32) 
instead, and then attempt to extract p’ and finally exponentiate it. But the approximation we have proposed 
is more efficient, as it does not require the calculation of the logarithms or exponentials of any of the pixels. 
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The degree to which Lo, is invariant to the photometric angles is mostly dependent on how well 
S’ models the effects of mags; and how accurately ps can be estimated, but some dependencies 
may also result from the error in our logarithmic approximation of C’. The use of this expression 
to remove p’s effects from L,, is an example of what we meant earlier when we spoke of 
calculating a time-invariant function of the triplet (A, p, p(A, p)), since for a good estimate of 


uw, Lo, will mostly only depend on p(-, p). 


The simplest case is, of course, when p is uniform over all p € P;, which implies that 


1(4 © Q:) © Sqr) +k , (4.37) 


for some constant k. In this case, mg is the only component of r; that has a significant effect 
on L‘,,, and so we can use the least-squares method to estimate the ys under which S’, up to 


an underdetermined additive constant, best models this effect. 


In the more general case of non-uniform p, suppose P; could be segmented into regions over 
which p is approximately uniform. If the largest of these segments accounts for at least 50% of 
P,, the remaining points of P; could be treated as outliers, and so a robust least-squares method, 
such as RANSAC [34], could potentially be used to estimate ws without having to explicitly 
calculate the segmentation. Note that if the largest segment corresponded to a vessel rather 
than the myocardium, this could cause S’ to be uniformly offset relative to the surfaces that 
we fit to other regions. This is acceptable if the surfaces that model P; are offset by the same 
amount for all t, as this will still lead to a representation of L* that is invariant to illumination 
changes. We can usually ensure that the myocardium accounts for at least 50% of P; by simply 
increasing P,’s size, but making it too large will reduce the accuracy with which the chosen 


type of parametric surface models the effects of mas;. 
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Figure 4.5: An illustration of the manner in which we arrange the squares over which we fit the 
surfaces. We define the overall value of the surface in regions covered by two squares, such as 
the red rectangle, by cubic interpolation. In regions covered by four squares, such as the green 
square, we use bicubic interpolation. 


4.3.4 Parametric Surface Fitting 
We considered two types of parametric surface in our tests: planar surfaces of the form 
Sp(q;a) = (ai2)"q +a , (4.38) 
for a 3D coefficient vector a; and circular paraboloids of the form 
So(q;b) = bi (gz + a) + Sp(q; be.4) , (4.39) 


for a 4D coefficient vector b. We chose these forms for their simplicity, and because the least- 
squares parameter estimates for both surfaces have simple linear forms. Despite the sensitivity 
of the least-squares estimate to outliers, we found the results that we achieved using these 
two surface types and the methods we are about to describe adequate, even over regions of 


non-uniform albedo, so we did not use more robust fitting methods. 


We began by dividing L’, into a regular grid of w x w overlapping squares, with an overlap 
of 1 < | 4] between all neighbours, as shown in figure 4.5. For each square Q, we fitted the 
surface to the pixels of the subset Q’ that Fo, labels as not being occluded. For any surface 


f(q;c) over Q satisfying the parametric linearity condition 


f(q;e) =c'g(q) (4.40) 
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for an arbitrary vector-valued function g, the solution to the least-squares problem 


2 
c=argmin )> (f(q:e’) — L1,(q)) (4.41) 
c’ qeQ’ 
is given by the Moore-Penrose pseudoinverse [78]. We will make use of this result numerous 
times in this thesis, and we defer a discussion of some of its uses in a slightly more general 


context to section 5.3.2. For now we will just state the solution: 


-1 


c={S > Lilaala" | | >> olaala) | (4.42) 


qed! qe’ 


For the planar surface S'p, we use 


9(4) = (Gedy 1)" , (4.43) 


and for the circular paraboloid surface Sc, we use 


9(4) = (42 +4. Ges Ay 1)" (4.44) 


It would be trivial to replace Sc with the more general quadratic surface, in which the quadratic 
terms are weighted independently, or to use higher-degree polynomial surfaces. But more 
complex surfaces may cause the least-squares method to overfit L7,, in the sense of modelling 
the effects of albedo variations on it. So such surfaces would only be appropriate when using 


robust fitting methods. 


After fitting an Sp or Sc (we will use S, to refer to either type) surface to each square Q, the 
final stage was to define S’ over each Q. We set S’ to the value of S, in regions covered by a 
single S,, and we (bi)cubically interpolate in regions where multiple S, overlap using the cubic 
kernel 


k(a € [0,1]) & (3 — 2a)a? , (4.45) 


which is one of the four basis functions used in cubic Hermite interpolation [7]. It is a partic- 
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ularly attractive interpolator to use because it is efficient to compute and it has the following 


properties: 


k(0)=0 , kK1)=1 , 
k%(0) =0 KYA) =0 , 
k(a € (0,1]) € [0,1] , kKA-—a)=1-k(a) , (4.46) 


where k“ is its first derivative. The fact that its first derivatives are zero at the end points 
helps to smooth the transitions between overlapping S,, and the last property describes the 
nature of its symmetry, which is such that whether: a) the kernel is reflected; b) the overlapping 
S, are rotated by 180°; or c) the overlapping S$, are reflected; the result of the interpolation 


will be the same up to a 180° rotation or reflection. 


Let: Qi, Qo, Q3 and Q, be four overlapping squares arranged as in figure 4.5; S,; denote the S, 
that has been fitted to Q;; [71,72] be the x ordinates of the region of overlap between squares 
in the same row; [y1, y2| be the y ordinates of the region of overlap between squares in the same 
column. We use k to define S’ at a point q that lies in the red or green region of figure 4.5 
as follows (assuming that x ordinates increase from left to right, and y ordinates increase from 


bottom to top): 


s(q) = La) (1-*(#=*)) + R@e (B=) (4.47) 


where 


Lt ) Sx1(q) ’ qy < Y1 
qd = 
S.(q) (1-# (45%)) +5. 5(a)k (LE), otherwise 
Ss2(q) 1 Wy <M 
R(q) = (4.48) 


S.2(q) (1 —k (2%)) + Sy a(q)k (2%) , otherwise 


We define S’’s value in the other regions covered by two squares in a similar fashion. 
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(b) (d) 


Figure 4.6: (a) An image from our 230108.FO1 sequence. (b) The result of applying Tan and 
Ikeuchi’s method [107] to (a) to remove the specular highlights, as computed by their source 
code. (c) An intraoperative image used by Stoyanov and Yang. (d) Stoyanov and Yang’s result 
after removing specular highlights from (c). Images (c) and (d) taken from [104]. 


4.4 Specular Highlight Masking 


The two most common approaches to dealing with the distracting effects of specular highlights 
in computer vision problems are: 1) to detect and ignore them; 2) to detect and attempt to 
remove them, ideally leaving an image that only depends on the diffuse reflection components. 
Tan and Ikeuchi proposed an algorithm that follows the latter approach in [107]. Their method 
iterates between a process that checks whether or not the current estimate of the diffuse compo- 
nent still has specular components, based on the dichromatic reflection model and logarithmic 
differentiation, and a process that attempts to reduce the intensity of any remaining specular 
components, based on a transformation of the image pixels into a “maximum chromaticity in- 
tensity space”, in which the diffuse component of the specular pixels can, in theory, be identified 
from the intersection of two lines. Unfortunately, their method performs extremely poorly on 
our images, as shown in figures 4.6a and 4.6b. Its failure to restore the specular highlights is 
probably a result of the fact that they are so bright that they destroy the diffuse component 
information. Furthermore, their method has the undesirable effect of greatly reducing the con- 
trast in regions where there are no specular highlights, which would only serve to make the 


tracking task more difficult than it already is. 


In [103], Stoyanov and Yang proposed a modification of Tan and Ikeuchi’s method to improve its 
performance on intraoperative images. Their idea was to register multiple images from a video 
sequence together, so that the chromaticity information that bright specular highlights destroy 


can be recovered from other frames. Images of their results are shown in figures 4.6c and 4.6d. 
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The results are better, but there is still a lot of discolouration within the removed specular 
highlights. For our application, this discolouration would prove to be just as distracting as the 


specular highlights themselves, so we cannot adopt their approach either. 


Unlike these approaches, we have chosen to handle specular highlights by detecting and ignoring 
them. We use an ad hoc method based on thresholding to construct specular highlight masks 
Fo, Such methods are quite common in the image-guided surgery literature, and have, for 


example, been adopted by [90, 87]. 


In our implementation, we transform the video images into HSV space, and label pixels as 
specular if they have low saturation and high value (intensity). We calculate local thresholds 
over the saturation and value channels by partitioning the image into a regular grid of squares, 
calculating two 1D histograms over the saturation and value channels for each square, and 
defining the thresholds as percentiles of the histograms. This makes it easier to detect dim 
specular reflections without generating lots of false positives (i.e. labelling mostly-diffuse pixels 
as specular). We construct the histograms for a square S centred at p over a larger square 
S’ > S also centred at p, so as to improve labelling consistency across neighbouring squares. 
We typically define each S to be 101 pixels wide, and each S’ to be 301 pixels wide. Percentile- 
based thresholds are not sufficient, however, as they may lead to false positives in squares that 
contain no specular highlights. So we also use spatially-invariant thresholds, labelling a pixel 
as specular if and only if its value and saturation are respectively greater than and less than 
both the percentile-based threshold and the spatially-invariant threshold. After completing the 
labelling, we finally dilate the mask a few times, to ensure that the boundaries of the specular 


highlights are sufficiently masked. 
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4.5 Results 


4.5.1 Image Filtering 


Figures 4.7 and 4.8 show typical results given by our lightness normalisation filtering and 
specular highlight masking methods. The effect of lightness normalisation is most pronounced 
in the second of these two figures, in which the visible part of the myocardium is more curved, 
leading to greater illumination-induced appearance variations. In the original images shown in 
this figure, the observable diffuse myocardial radiance is at its greatest (and hence, ignoring the 
specular highlights, the myocardium is at its lightest) in the area surrounding the largest region 
of specular highlights. Moving away from this specular region, the radiance decreases smoothly, 
eventually giving way to shadows in the top-left and bottom-right corners (the shadows are 
caused by the pericardial tissue that envelops the myocardium. Part of it can be seen hanging 
down on the left- and righthand sides of the images). All frames of this sequence were like this, 
since the camera and its light remained stationary, so the cardiorespiratory motion caused the 
vessels (which are usually the features that we are interested in tracking) to move back and 
forth in-between bright regions and darker regions. The lightness normalising filters help to 
remove the effects of these radiance variations, making the changes in the illumination of the 
vessels less pronounced. Note that some local lightness changes due to surface creases near 
vessels remain, e.g. near the top-left corner of the frame 62 images. These can be eliminated by 
using smaller grid squares, but making the grid squares too small reduces the contrast between 


the vessels and the myocardium. 


By experimenting with the specular highlight mask thresholds, we were able to mask out most 
of the highlights, although a few dim ones remain undetected in the 230108.FO1 images. It 
is difficult (in fact, probably impossible) to find threshold values under which the mask will 
detect these without introducing more false positives, but even dim specular highlights may be 
sufficient to distract the importance sampler. One possible solution to this may be to compute 
“soft” masks, which assign a value in the interval [0,1] to each pixel, denoting an estimate 


of the proportion of its intensity that is caused by specular reflection. Such masks could be 
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By 


(e) Frame 1, L* normalised 


(i) Frame 47, spec. mask 


> 


(j) Frame 47, normalised (k) Frame 47, L* normalised (1) Frame 47, 5S” 


Figure 4.7: Examples of our lightness normalisation and specular highlight masking results for 
frames 1 and 47 of the 230108.FO1 sequence. For lightness normalisation, we used 60-pixel-wide 
planar surfaces with 30-pixel-wide overlaps (the image dimensions are 768 x 576). To facilitate 
comparison, we have added the mean lightness of frame 0 to all lightness normalised images. 
(a), (g) The original images from frames 1 and 47. (b), (h) The original L* channels. (c), (i) 
Specular highlight masks — highlights shown in cyan. (d), (j) The result of applying lightness 
normalisation to the L* channels and converting the images back to RGB space. (e), (k) The 
lightness normalised L* channels. (f), (J) All of the 5’ surfaces used in lightness normalisation, 
rendered side by side. 
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(a) Frame 38 (b) Frame 38, L* (c) Frame 38, spec. mask 


(d) Frame 38, normalised (e) Frame 38, L* normalised (f) Frame 38, S’ 


(g) Frame 62 (i) Frame 62, spec. mask 


(j) Frame 62, normalised (k) Frame 62, L* normalised (1) Frame 62, S’ 


Figure 4.8: Examples of our lightness normalisation and specular highlight masking results 
for frames 38 and 62 of the 180608.HS sequence. All of the comments in the previous figure’s 
caption apply, except that we used circular parabolic surfaces in the lightness normalising filters 
this time. 
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used by our dissimilarity function € without changing its definition, and may even simplify the 


problem of deciding what percentile to use with it. 


The ideal choice of percentile is one that gives sufficient insensitivity to outliers without sac- 
rificing €’s ability to discriminate between states that are genuinely dissimilar. Our current 
binary masks produce outliers in the form of false negatives (mostly-specular pixels labelled 
as diffuse). Each false negative occurring at a pixel that, in the absence of specular reflection, 
would be considered similar to the pixel it is being compared to, reduces the number of outliers 
that € can ignore under any given choice of percentile, i.e. reduces the extent to which it can 
ignore deformation-induced dissimilarities between similar regions. Soft masks would alleviate 
this problem by ensuring that false negatives are given low weight, so that significant outliers 
only occur at dissimilar diffuse pixel pairs. We intend to consider probabilistic approaches to 


soft mask construction in our future research. 


4.5.2 Comparative Tests 


To test the extent to which our lightness normalisation and image comparison methods help 
to improve the performance of our particle filter, we tracked 14 patches in each video using 
the likelihood evaluation methods described in this chapter (for screenshots of the results, 
see section 6.6. The results section at the end of chapter 5 gives a statistical summary of 
the amount of motion/deformation in the two videos we used), and then we performed two 
comparative experiments. In one, we tried tracking the same patches without applying our 
lightness normalisation procedure to the L* channel, and in the other, we did use lightness 
normalisation, but replaced our image region comparison function € with the weighted sum of 
squared differences between the pixels. For the latter test, we used the same weighting method 
that we discussed in section 4.2, making this test equivalent to using € with the percentile 
threshold p set to 1 (i.e. not discarding any part of the weighted pixel difference distributions). 
In all tests, we used the methods that we will discuss in chapter 6 to calculate maximum- 
likelihood estimates of the particle filter parameters (the kurtosis parameters a, and ay, the 


inverse decay rate parameters yz and yy, and two things that we will introduce in that chapter: 
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a particle loss test “background likelihood” distribution and a particle loss test threshold c). 
It was necessary to do this because the components that we changed in each test changed the 
way in which the particle filter used image information, and so parameters that were optimal 


in one test were not necessarily optimal in another test on the same video. 


The results that we have plotted are based on ground truth estimates that we calculated by 
fitting deformation fields to manually-tracked landmarks (we will discuss how we did this in 
detail in chapter 5). For each particle in each time step, we calculated 5 Euclidean distances: 
the 4 distances between the particle’s vertices and the ground truth positions of the corre- 
sponding vertices of the patch we were tracking (under the affine transformations that best fit 
the deformation fields), and the distance between the particle’s centre and the ground truth 
position of the corresponding patch’s centre. We then associated the errors for each particle 
with the particle’s weight, sorted the errors, and calculated percentiles of the resulting weighted 
error distribution over all frames for each patch that we tracked. These distributions give us 
an indication of the particle filter’s degree of uncertainty for each patch under each test. We 
have plotted these percentiles for each test and each video in figure 4.9. Note that the values 
of all parameters of the particle filter that we specified manually are summarised in tables 3.1 
and 6.1 on pages 72 and 211 respectively, with the exception of the components we changed for 
each comparative test, and of the outlier cutoff p for the € dissimilarity function that we used 
in the evaluation of the patch likelihood functions L;. In our results at the end of the previous 
chapter we used p = 0.75, but we used p = 0.99 for the results here, as this seemed to lead to 


better performance. 


Comparing the leftmost columns of percentiles for each patch in the 230108.FO1 video results 
in figure 4.9a (which represent the error distributions under the methods we have proposed 
in this chapter) to the central columns (which represent the distributions without lightness 
normalisation), it is clear that the use of lightness normalisation reduces the errors in the vast 
majority of cases, sometimes by several pixels (as, for example, in the case of patches 58, 59, 69 
and 70). Lightness normalisation only led to much worse results with some of the percentiles 
of 2 of the 14 patches we tracked: the 50‘, 75'* and 95" percentiles of patch 71; and the 25", 


50%, 75%, and 95* percentiles of patch 78. It is not entirely clear why this happened, but it 
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Figure 4.9: These figures show percentiles of the weighted error distributions that resulted from 
tracking 14 patches with our proposed method (leftmost column of each group), from tracking 
the same patches without lightness normalisation (middle columns), and from tracking the 
patches using the weighted sum-of-squared differences rather than € (rightmost columns). The 
IDs identify each of the patches that we tracked in each video. See figure 6.8 on page 212 to 
see what the patches looked like. 
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may have been a result of the lightness normalisation process inadvertently destroying some 


useful textural information about these patches. 


Lightness normalisation led to even greater tracking performance improvements in the 180608.HS 
video, as shown in figure 4.9b. This is to be expected, as there were much greater illumination 
variations in this video than in the 230108.FO1 video, as mentioned earlier. E.g. compare the 
50%, 75% and 95*" percentiles of patches 34, 35, 36, 38, 40, 42, 49 and 60 with and without 
lightness normalisation. The only percentile for which lightness normalisation led to much 


worse results was the 95" percentile of patch 51. 


The use of € led to less remarkable improvements over our tests using the weighted sum-of- 
squared differences (the results of which are shown in the rightmost columns of the patch 
groups in figure 4.9). € did bring substantial improvements to patches 58, 59, 69 and 70 in 
230108.FO1 (the 95* percentiles of which were at least 10 pixels lower with €. Some of the 
other percentiles were also much lower under €), and patches 34 and 49 in 180608.HS (the 
95‘ percentiles of which were at least 12 pixels lower under €). However, the 95‘ percentile 
of patch 60 in 230108.FO1 was 9 pixels higher under €, and the 75‘ and 95‘ percentiles of 
patches 37, 43 and 51 in 180608.HS were at least 6 pixels higher under €. For both videos, the 
differences between many of the remaining percentiles were smaller, and it is not clear if these 
differences are just due to inaccuracies in the assumptions we make when calculating maximum 
likelihood estimates of the particle filter parameters, or if they indicate significant differences 


between € and the sum-of-squared differences approach. 


4.6 Conclusion 


In this chapter, we have described the methods that we used to evaluate the patch likelihood 
function L;, and the local likelihood function ¢;. We have shown how significant changes in 
the appearance of a patch can occur as a result of non-deformational phenomena, such as 
illumination changes and specular highlights, and we have proposed methods for reducing the 


extent to which they influence the likelihood functions, such as masking out specular highlights, 
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and using a “lightness normalisation” method to remove the effects of illumination on the 


lightness channel with parametric surfaces. 


Our comparative results suggest that our lightness normalisation method can lead to substantial 
improvements in tracking performance, particularly in videos with large illumination variations. 
The image region comparison function € that we use to attenuate the influence of unmasked 
specular highlights and non-affine deformation components has shown some promise, but further 


tests are required before we can ascertain the situations in which it is most helpful. 


Issues to be Addressed in the 


Remainder of this Thesis 


We have now set out the approaches we have taken in designing the main components of our 
particle filter, but there are two problems related to its performance that must be addressed 
before reliable, long-term tracking results can be obtained. The rest of this thesis will focus 
on describing the solutions we have developed to some of these problems, and analysing the 


performance of these solutions. 


The first problem is that the particle filter’s output is heavily dependent on the values chosen 
for the hyperparameters of the likelihood functions. Setting the standard deviation control 
parameter, yy, too low or the kurtosis control parameter, a, too high may prevent the importance 
sampler and patch likelihood evaluator L; from assigning sufficient probability to the correct 
state, causing the particle filter to drift. On the other hand, setting y too high or a too low will 
cause the importance sampler and patch likelihood evaluator to be insufficiently discriminative, 
creating posterior distributions with high uncertainty, which will again cause the particle filter 
to drift. We would ideally like to have an unsupervised method that selects these values based 


on information from the images. 


The second problem is to do with the fact that particles will inevitably drift from time to time. 
In the best case, such particles will end up with negligible weight, and hence contribute little 
information about the posterior, in which case updating them will be a waste of computational 
resources. In the worst case, lots of particles may become lost, impoverishing the particle filter’s 


representation of the posterior to the point where estimated posterior expectations become 
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unreliable. 


We will address these issues by considering the following questions: 


1. How should the kurtosis and inverse decay rate parameters of the likelihood functions be 


chosen? 
2. How can the particle filter determine when a particle is lost? 


3. How might the particle filter restore a particle that it thinks is lost? 


If a patch is distinguishable from its surroundings, the second of these questions may be dealt 
with by performing a hypothesis test in which we compare the likelihood of a particle’s ap- 
pearance given its hypothesised state to the likelihood of its appearance given the hypothesis 
that it has drifted away. We can only perform such tests reliably if we have a good parameter- 
isation of both of these likelihood functions. Thus, the first two questions are closely related, 
and any solution to them will have to exploit information about the expected changes in the 
appearance of a patch and its surroundings. The lightness normalisation method we have dis- 
cussed in this chapter simplifies matters by removing the need to consider appearance changes 
caused by changing illumination conditions, but we must still consider the effects of myocardial 


deformations on patch appearance. 


So our aims over the next two chapters are as follows: 


1. Chapter 5 — to develop statistical models of the sequences of deformations that patches 


are likely to undergo. 


2. Chapter 6 — to use these models to estimate the parameters of the likelihood functions, 
to develop a hypothesis test for inferring whether or not particles are lost, and to discuss 


possible ways of restoring lost particles. 


The approach we shall take to estimating the likelihood parameters will involve simulating 
the deformational appearance changes of the patches, and so we will only consider generative 


statistical deformation models in the next chapter. 


Chapter 5 


Deformation Modelling 


5.1 Synopsis 


The work we will present in this chapter paves the way for the solutions we have developed to the 
first two questions posed in the previous section. As explained in section 3.2, we have assumed 
the patch deformations to be affine so as to improve the well-posedness of the tracking task and 
to make it computationally tractable, and in section 3.3.2 we approximated the deformations 
of subregions by uniformly displacing their pixels. But in reality, the deformations of the 
myocardium are highly non-linear, and the most significant type of uncertainty about the 
appearance of a region of the myocardium that must be taken into account when parameterising 
a likelihood function is the uncertainty introduced by these approximations. Developing a 
realistic statistical model of the myocardium’s non-linear deformations will allow us to simulate 
the way they change the appearance of a patch and the appearance of its surroundings, providing 
an approximation of a data set that we can then use to calculate an initial parameterisation of 


the likelihood functions. 


The deformation models we built were all calculated from our observations of intraoperatively- 
acquired myocardial image sequences. The model construction procedure is a three-stage pro- 


cess, with each stage involving the following: 
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1. Dense Deformation Field Estimation - For each image sequence, we choose a refer- 
ence frame t,, and calculate “smooth” and invertible grids representing the mapping of 
each point within a region of interest from frame t, to each successive frame over a time 


period long enough to cover one cardiac cycle. 


2. Deformation Mode Modelling - We generate a set of subgrids placed at randomly- 
selected locations within the domain of the deformation fields, and use the fields to de- 
form the subgrids over the whole image sequence. We remove the affine component of 
the subgrid deformations, and calculate principal deformation modes with which the di- 


mensionality of the subgrids can be reduced. 


3. Deformation Trajectory Mode Modelling - We use the reduced-dimensionality de- 
formation model to calculate sequences of parameters from which the subgrid sequences 
can be reconstructed. Then we calculate principal modes for these parameter sequences, 


and once again reduce their dimensionality. 


In the following sections, we will discuss our implementation of each of these processes. 


5.2 Dense Deformation Field Estimation 


The first stage of the model construction procedure involves constructing dense deformation 
fields from which the motion of any point within a region of interest can be calculated. More 
specifically, given a reference frame ¢t, and a region of interest #, we wish to construct vector 


fields 


y,:R? > R? (5.1) 


such that v;(p) denotes the position at time t of the point that was at position p € R at time 
t,. It is vital for the vy; to be invertible, as we will need to explicitly invert them later. The 
v, should also be reasonably smooth, reflecting the smoothness of the observed deformations. 


So in short, we may say that the v; should be (approximations of) diffeomorphisms*. For 


*A diffeomorphism is a differentiable map with a differentiable inverse. We will not need to calculate 
derivatives of our deformation fields however, so we do not need them to be differentiable everywhere. 
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Figure 5.1: An example of the trajectories followed by 5 manually-tracked points over a cardiac 
cycle, with units in pixels. The circular markers denote the locations of the points at 5- 
frame intervals. Each trajectory begins at the top-left marker. The range of displacements 
of the points, relative to their initial positions, is up to 300 pixels horizontally and 200 pixels 
vertically. The apparent lack of periodicity is due to the effects of respiratory motion. 


notational convenience, we will sometimes refer to the corresponding displacement fields 


Av,(p) =ui(p) —p , (5.2) 


which give the displacement of point p € R from time t, to time t. 


As shown in figure 5.1, points on the surface of the myocardium undergo very large displace- 
ments from their initial position over the course of a cardiac cycle. Most optical flow and image 
registration algorithms would struggle to calculate such large displacements accurately. The 
simplest solution to this one might try is to use optical flow to calculate displacement fields 


Av, between consecutive frames t — 1 and t, and then define v, by composing the P;: 


Vz= DY, O...0 D441 : (5.3) 


This may work well when the amount of interframe motion is small. But in the image sequences 
we have been working with, the interframe motion can be as large as 30-40 pixels. Errors are 
bound to occur with such large displacements, especially in the large homogeneous regions 
(shown, for example, in figure 5.3), and over time these errors will accumulate, leading to poor 


displacement estimates. 


5.2. Dense Deformation Field Estimation 119 


More reliable initial estimates of the solution can be obtained by manually tracking as dense 
a set of landmarks as possible over a cardiac cycle, taking R to be their convex hull (which 
can be determined from their Delaunay triangulation), and then estimating the motion at all 
other points in R by one of the many interpolation/approximation schemes published in the 
literature. Any such initial solution estimate Av; can potentially be refined either by using it 
as the initial estimate in an optical flow algorithm between images Z;,, and Z;, or by using Av;.9 


to transform Z; into an image Z’; that is more similar to Z,. by backprojecting Z,’s pixels: 


Z'\(p) = Zi(vro(p)) (5.4) 


then calculating the optical flow Av’ between Z;, and Z’, and finally defining Av; as an additive 
update of Av, 0: 
Av;(p) = Avio(p) + Av’(p) . (5.5) 


So, given manually-defined sets of landmarks Q, such that every point q,; € Q; is known to 
correspond to a point q,; € Qy for all times ¢ and ¢’ at which the landmarks are defined, we 


want each v; to minimise e7[{v;], the sum of squared differences between 1;,(Q,,.) and Q:: 


[v4] = e2 [Vie] + esi] (5.6) 
where: 
1Qz| 1Qz| 
€2|V2,0] = So (Yre( Qa) = Gein) , e[Mey| = So (Yel Ge.) — Uiy)” : (5.7) 
i=1 i=1 


Clearly, if there is at least one diffeomorphism that interpolates the landmarks, there must 
be uncountably many (since small changes can be made to 1, anywhere except for at the 
landmarks). Furthermore, there are even more diffeomorphisms that approximate the landmark 
trajectories to within any given error bound. Thus, to find a unique solution, we must introduce 


additional constraints. 


In [38], large classes of regularising constraints are presented under which unique solutions 
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to this approximation problem can be obtained. A regulariser will typically be designed to 
maximise the smoothness of an approximating function, so that any differences in the solutions 
induced by different regularisers will be the result of differences in the way the regularisers 
measure smoothness. As €? contains no terms that depend on the fitting error in both the x 
and y directions, regularising v;,’s output dimensions independently reduces the approximation 


problem to that of finding a scalar field for each dimension. 


Taking the idea of a function’s roughness being a measure of the extent to which it oscillates, 
the authors of [38] show that a functional ¢|-] that measures the roughness of a scalar field f 


can be defined by measuring the amplitude of the high-frequency components of f’s Fourier 


transform; i.e., if f is the Fourier transform of f and 73 is a high-pass filter, then 


n= f Aeon, 69 


where the integration is performed over the frequency domain. The problem of defining f 


is thus reduced to that of selecting a high-pass filter & and a parameter A that controls the 


trade-off between minimising ¢? and minimising <¢: 
f =argmine*[f’) + As[f’] . (5.9) 
ff 


A popular approximation method (used in many related areas of computer vision, such as 
interpolating sparse reconstructions of surfaces [44], registration for shape matching [10] and 
parameterising deformations for 3D myocardial motion tracking [87]) is the Thin-Plate Spline, 


which filters out low-frequency components with 


1 


Ze = bel (5.10) 


Filters like this that act isotropically in the Fourier domain (and hence smooth isotropically in 
the original space) are known as Radial Basis Functions. As shown in [38] (and more formally 
in [114]), using Radial Basis Function filters like this leads to solutions of eq. (5.9) that just 


involve solving a linear system of equations. In particular, Thin-Plate Splines give the following 
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solution for the # ordinates of v, (the uniqueness of this solution was proven in [30]): 


|2 
Vie(p) = > aiG(p qi) +b p+e , (5.11) 


i=1 


where G(a) £ ||a||? In ||a|| is the Fourier transform of G, and the |Q;|-vector a, 2-vector b and 


scalar c are coefficients given by the solution of 


Gitex 
a 
GeNP KO: \ |= 
| (eee |e (5.12) 
T ac 
Q 0 94,1Q:\,0 
C 
0 


where G is the |Q,| x |Q;| matrix Gi; = G(q,, — a,,;) » and Q is the |Q,| x 3 matrix with 
i row given by Q,. 4 (4:,..c> Vt,,y: 1) - This can be solved easily using standard linear algebra 
techniques, such as the LU-decomposition (e.g. see [40]). v1, is defined by solving a similar 
linear system in which the vector on the right-hand side contains the y ordinates of the q, 


terms. 


The Thin-Plate Spline regulariser can be shown to satisfy the following equivalence (see {114]): 


[fl = } |F(s)['Is|l*ds = / Zs (p)|[2ap (5.13) 


where Hf(p) is the Hessian matrix of f at point p and the norm is the Frobenius norm. From 
this alternative expression of the regulariser, it becomes clear that the regulariser is minimised 
when the scalar field f is linear in its argument. This means that 1; minimises the regularisers 
in each output dimension when it is an affine function. In fact, the solution given by eq. (5.11) 
tends to an affine vector field as A > oo. At the other extreme, when \ = 0, the solution 
interpolates the landmarks exactly (this follows from the construction of the linear system in 


éq: (5.12). 


In the ideal case, if the 1; could be calculated perfectly, no motion would be visible in the 


backprojected images Z’; (although they may still appear to change with time due to: drifting 
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Figure 5.2: (a) This figure shows the median Euclidean error between the true landmark 
positions and the Thin-Plate Spline estimates of their positions in each frame for various values 
of A. (b) This figure shows the maximum Euclidean fitting errors. In this case, the largest 
value of A for which the errors are < 5 pixels is A & 2500. 


specular highlights; changes of texture as the myocardium stretches and contracts; illumination 
changes — we did not make use of lightness normalisation in this chapter, as we calculated point 
correspondences manually). For low values of the regularisation parameter \, the Z’;s tended to 
undergo unnatural-looking oscillatory warps, suggesting that v;, was overfitting the data. We 
found that the smoothness brought about by increasing \ seemed to decrease the magnitude of 
these warps, so we chose the value of \ by manually searching for the largest value that led to 
a landmark fitting error of no more than about 5-7 pixels*. Unfortunately, it would be difficult 
to quantify the extent to which increasing \ decreased the warping, as the warping is only 
noticeable in the regions of Z’; that we are unable to reliably track (such as the homogeneous 
regions), and it is not noticeable in the vy; at all. Figures 5.2 and 5.3 show typical examples of 


our results. 


After calculating the Thin-Plate Spline coefficients, we evaluated the Thin-Plate Splines over 


a regular grid so that we could use a grid-based approximation of the deformation fields in the 


*We believe 5 pixels to be a reasonable upper bound on the accuracy with which we can manually track 
landmarks, and we are willing to accept motion estimation errors of up to 10-12 pixels. If this seems like 
an overly generous error allowance, the landmark fitting error distributions shown in figure 5.10 on page 162 
suggest that the actual motion estimation errors that occurred at the landmarks during our tests were less than 
10-12 pixels in the vast majority of cases. 
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remaining work in this chapter. Given a deformation field v;(p), a grid-based approximation 
V,(p) defined over a regular grid G C {o, + 16, :71€ N} x {o, + id, : i € N} (for some offset 
o) returns v;’s value at the grid vertices, and approximates it in-between by interpolation. We 


used bilinear interpolation: 


= V:(p) ,pEc G 
V,(p) = 

(1 = a) {(1 = B)Vi,; + Bvis1,;} +a {(1 ma B)Vigtt + BVisij4i} ; otherwise, 

(5.14) 
where P.O € [t,74+1), Py € [j,7 +1) and 
fe Py — Oy — 59g Pz — Ox — Wy 
Vas = V0 + 54(a,b)") a =, ga (5.15) 
g g 


The grid vertex spacing 6, can be as small as desired, but it need not be smaller than the 
largest size at which bilinear interpolation gives an adequate approximation of the Thin-Plate 


Spline deformations. 


The advantages of grid-based representations over Thin-Plate Splines are that: bilinearly- 
interpolated grid-based deformations can be inverted exactly very efficiently (we will explain 
how in section 5.3.1), whereas inverting Thin-Plate Splines would require a more difficult, and 
less efficient, iterative method; the sum of two bilinearly-interpolated grid-based vector fields 
can be calculated exactly when they are defined on the same grid by simply adding the vectors 
at each grid vertex (the grids would have to be resampled if they differed), and so they are 


well-suited to the refinement scheme in eq. (5.5). 


In our tests, we found that calculating the Thin-Plate Spline deformation fields between frame 
t, and each ¢ directly and then replacing them with a grid-based representation always created 
invertible transformations. There is no guarantee that this will always be true however. When 
the result is non-invertible, the frame-by-frame compositional method of eq. (5.3) may be 
a viable alternative. If that still does not work, the trajectories of the landmarks can be 
upsampled first, so that the compositional approach can be applied with arbitrarily small 


displacements between time steps. 
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5.3 Deformation Mode Modelling 


After constructing the deformation fields, the next stage is to develop models of the principal 
modes of deformation that we can expect the patches we want to track and their immediate 
surroundings to undergo. As our ultimate goal is to construct a statistical model from which we 
can simulate myocardial surface deformations, it is beneficial to adopt Occam’s razor by search- 
ing for the lowest-dimensional representation of the model that explains a sufficient amount 
of the observable deformation data. The benefits of this are twofold. Firstly, by constructing 
models that can explain the most significant parts of the observed deformations whilst deviating 
from them by no more than a tolerable amount, we may be able to filter out some anomalies 
in the data brought about by errors in the deformation field calculation process. Secondly, 
minimising the dimensionality of the representation minimises the number of samples that are 


needed to parameterise it. 


As we already have a parameterisation of the affine components of the patch deformations 
(eq. (3.1)), we neither need nor want our deformation mode models to contain any affine 
components. So we base the models on an analysis of the residual effects of the deformation 
fields on a canonical grid g*, after removing the affine components of the deformations that 
they induce on it (the affine components are reintroduced in the final stage, in which we model 
deformation trajectory modes). The model also needs to incorporate the uncertainties brought 
about by state information that will be unknown at simulation time, such as the phases of the 
cardiac cycle that the heart will be in in the frames that we will want to begin the simulations 
from, and the orientations and positions of the regions we will be simulating deformations of 


in these frames. 


Let T be the length of the cardiac cycle that a deformation field sequence is defined over, so 


that the fields are defined over frames T, = |t,,...,t; +7’ — 1]. We construct a deformation 


*We will defer specifying the forms of g that we used until section 5.3.3. Until then, g should just be 
considered to be a fixed configuration of 2D points. 
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mode model from a set Gg of data sample tuples 


(0, f, 97, he, gr.) 5 (5.16) 
where: 
o € {0,...,7’— 1} is a cardiac phase offset random variable that we use to model the uncer- 


tainty concerning the phases that the simulations may start from; 


f is a randomly-selected similarity transformation (i.e. a transformation that only consists of 
isotropic scaling, translation and rotation) that maps g into an initial state within frame 
t, +o, so as to account for variations in the positions, scales and orientations of the regions 


we will need to simulate deformations of; 


gr, is the sequence of deformed instances of f(g) defined by the deformation fields, under the 
constraint that g, ,, = f(g). Since the deformation fields are defined relative to frame t,, 


. . . Is o) . —1 2 
we have to enforce this constraint by defining g, in terms of 1,..’8 inverse Vy) 4: 


= U1 (Vi ro(F(g))) 5 (5.17) 


hr, is a sequence of transformations that affinely registers each g} to g; 


gr, is the result of registering each g; to g with h;. It represents the sequence of non-affine 


deformation components. 


Since: the regions we will be simulating deformations of could lie anywhere on the visible part 
of the myocardium; we have no prior information about what the distance of the endoscope 
from the myocardium will be or what its orientation about the viewing direction will be; and 
we may need to begin the simulation at any cardiac phase; we take the most noncommittal 
approach of sampling the pair (0, f) from a distribution that is close to uniform (we will define 
the sampling procedure precisely in section 5.3.3). Bounds are imposed on this distribution by 
the constraint that f(g) must lie within 1,,,,’s image, so that we can calculate gi. The process 


of constructing Gg is summarised in algorithm 5.1. 
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Algorithm 5.1: Constructing the deformation mode model sample data set Gg. 

Result: A set Gs of N data sample tuples. 

Gs :=9 

for 1 :=1to N do 
(o,f) := a (cardiac phase offset, similarity transformation) pair drawn from an 
approximately uniform distribution bounded by the constraint that f(g) lies in the 
image of 4.45 . 
ELSete 
for j := 1 to T do 


G, = Vt (V4 0(F(9))) 
h, := an affine transformation that registers g} tog . 


t= hi (gt) 
te=e+1 
end 
Gs = Gs IS) {(oxf59', h, g)} 


end 


It is important to mention that our construction of G'g is unable to adequately take into account 
perspective changes caused by variations in the angles of inclination of the regions we will be 
simulating deformations of, since we do not have access to depth information. Knowing the 
depth of the pixels in the image sequences would allow us to construct the deformation fields and 
deformation mode models in 3D, which would lead to simulations in which perspective changes 
emerge naturally. Without depth information, the best we can do is to construct sample sets 
from multiple deformation field sequences and merge them together, so that perspective changes 


will be “baked into” the deformation mode and deformation trajectory mode models. 


In the next two sections we will explain how to invert the deformation fields and how to carry 
out the affine registration of two grids. After that, we will discuss the details of the various 


approaches to deformation mode calculation that we employed. 


5.3.1 Deformation Field Inversion 


As we use a grid-based representation for the deformation fields, calculating v;'(p) involves 
identifying the square S of the grid G that satisfies p € v,(S), and then calculating the 


interpolation coefficients of the point in S that v, maps to p. S can be determined efficiently 
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by the use of an inverse lookup table, which only needs to be calculated once for each 1. 


The lookup table is defined by a finite regular grid G’ that covers v;’s image such that each 
square S" of G’ is associated with a subset H(S") of G’s grid squares. The grid spacing of G’ 
we use is the mean side length of »;(G)’s quadrilaterals. As it is trivial to identify the square 
S’(p) of G’ containing the point p that we want to apply the inverse transformation to, defining 
H(S'(p)) such that 

Se H(S'(p)) HM (S)A S'(p) 40 (5.18) 


means that we only need to search H(S'(p)) to find the square S(p) of G that contains v;'(p). 


We identify the square by verifying that v;(S(p)) = Habde contains p (see figure 5.4b). The 


containment test is performed on triangles Aadc and Aabd, using the fact that a point q lies 
in triangle Avyz if and only if q is on the same side of line: xY as z, ZZ as y and yZ as a, 
which in turn can be determined using the fact that q lies on the same side of a line rg asa 


point wu if and only if 


(rq x r8)(ru x 78) >0 , (5.19) 


where x denotes the 2D cross product: 
UX WE VyWy — VyWe - (5.20) 


Although the optimal way to define which H sets to add each square S' of G to would be to 
rasterise v;(.S') with respect to G’, it is much simpler and not much less efficient to just add S 


to each H(S’) for which S’ intersects v;(S)’s bounding box, as shown in figure 5.4a. 


The final step of calculating the interpolation coefficients a and (as in eq. (5.14)) that define 
v; '(p) involves solving a quadratic equation. With reference to figure 5.4b, we use the solution 
given in [16], where the author points out that the a parameter takes the unique value that 
makes the points a + aaé and b + abd collinear with p. This collinearity requirement is 


— 
equivalent to stating that the sine of the angle between a + aaé— p and b + abd — p is zero, 
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(b) 


Figure 5.4: (a) The time t, regular grid G is deformed by 4%. Each coloured quadrilateral 
v,(.S) in the deformed grid corresponding to a grid square S of G has a bounding box indicated 
with dashed lines and is added to H(S"), for each square S$" of G’ shaded in a similar colour 
to v,(S). (b) The bilinear interpolation coefficient a, which is used in the calculation of the 
position of v;'(p) in the grid square of G that is mapped to Dabde, makes a + aae, p and 


b+ abd collinear. 


which implies that their cross product is zero, which in turn implies: 


—. — — — 
(aé x bd)a? + (bd x ap — aé x bp)a + (ap x bp) = xa? + ya+z2=0. (5.21) 
As explained in [16], a numerically-stable solution to this quadratic equation that lies in the 
interval [0, 1] is 
a= (5.22) 
2z 


—yty/ y2—42z 


— 
For non-degenerate quadrilaterals, « = 0 implies that aé and bd are parallel, which in turn 


, otherwise. 


— — 

implies that the angle from bd to ap is < 0 and the angle from aé to bp is > 0. At most 

one of these inequalities can be equal to zero, so y < 0 and the second case is executed. Also, 
— 

z = 0 implies that a@p is collinear with bp, but y is non-zero in this case for non-degenerate 


quadrilaterals. So this solution only fails when x = y = 0, which occurs when either b = d= p 


or a = c = p, or when Labdc’s vertices are collinear. Such degeneracies never occur in our 


application, as they could only be caused by physically impossible deformations. 
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— 
The 6 parameter is proportional to the projection of p— (a+ aac) onto b+ abd — (a+ aae): 


__ |lp—(a+cae)| 
~ |lb-+ abd —(a+aae)|) | ee 
Finally, we have 
v,'(p) = S(p)a + a(S(p)e — S(p)a) + B(S(P)s — S(P)a) (5.24) 


where S(p)a, S(p), and S(p), are the vertices of S(p) that correspond to a, b and c. 


5.3.2 Affine Point Set Registration 


The affine components of the transformations that map the grids of the sequence g7, to the 
canonical grid g can be expressed as the solutions of least squares optimisation problems. In 
sections 5.3.4 and 5.3.5, we will describe how we construct deformation mode models under two 
different types of canonical grid, each of which defines the affine component as the minimum 
of a different type of optimisation problem. We will define and solve these problems now, so as 


not to break up the discourse when we describe the deformation mode models. 


Let Y and X be d x N matrices of N d-dimensional column vectors representing g and one 
of the grids of gp, respectively*. The first optimisation problem is the unconstrained problem 
of calculating a d x d matrix M that minimises the sum of squared differences between the 
vectors of MX and Y: 
M =argmin||MX —Y||%, . (5.25) 
mM’ 


The second is the constrained problem of minimising eq. (5.25) subject to 
Ma=b, (5.26) 


for some d-vectors a and b. Although, in both cases, we require M to have strictly-positive 


*To account for translation, we will represent 2D affine transformations in homogeneous coordinates, for 
which d= 3. 
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determinant, it is not necessary to enforce this algebraically, since the deformation fields that 


the Goi are calculated from contain no reflective components. 


It should be noted that the popular Procrustes Analysis method (described, for example, in [43] 
and [94]) is too restrictive to solve these problems, since it only finds the best-fitting similarity 
transformations. Also, although our aim is to bring all of the grids in the gp, sequences into 
a common coordinate system, it is unnecessary for us to perform an affine equivalent of the 
iterative Generalised Procrustes Analysis method developed by Gower in [42], since the gi. 


sequences are all deformations of the pre-defined canonical grid g. 


Unconstrained Affine Registration 


The unconstrained problem is equivalent to solving d linear least squares problems, one for each 
row of M and Y: 


M,;, = argmin||(Y;,.)" — XTe||” . (5.27) 
ct 


The solution to this is well-known, and is given by the Moore-Penrose pseudoinverse (XX ")~1X 


of X™ (see [78] for the derivation): 
(M;.)? =(XX7)1X(Y;,)* . (5.28) 
Stacking these solutions for each row of M gives: 


M=YX'(XX"). (5.29) 


Constrained Affine Registration 


The constraint Ma = b removes d degrees of freedom from M. Equivalently, considering each 


row of M to be a d-vector, the constraint tells us that IM/;,; lies on the hyperplane with normal 


_a_ 


1a] that lies at distance Bi, from the origin. By this latter interpretation, each MM; can be 


a|| 


parameterised by an unconstrained (d—1)-dimensional column vector W.,; that defines a point 
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on the hyperplane. The problem is thus reduced to that of finding the (d — 1) x d matrix W 
that solves 


W =are min ||(W’TBT + Q")X — YI. , (5.30) 
Ww’ 


after which M is given by 
M=W'B'+Q' , (631) 


where B" is a (d—1) x d matrix of rank d— 1 that satisfies 

B'a=0, (5.32) 
and Q” is an arbitrary d x d matrix that satisfies 

Q'a =D; (5.33) 


Given B and Q, eq. (5.30) is another linear least squares problem, which is solved this time 


by the pseudoinverse of (B'X)?: 


W =are min |W’ B™X — (Y —Q™X)||;, 
Ww’ 


= ((Y — Q™X)(BTX)"(B'X(B™X)")-!)" , by analogy with eq. (5.29) 


=(B'XX'B)'B'X(Y' — X'Q) , (5.34) 


The rows of B™ define a basis (up to a displacement) for the subspaces occupied by the d 
parallel hyperplanes that the rows of M lie on (or equivalently, the columns of B form a basis 
for a™’s null space), and each row of Q* is a point on the corresponding hyperplane. The 


choice of B and Q that satisfy eqs. (5.32) and (5.33) is arbitrary. We define Q™ as 


ar=(0 ee) pt 0 =) (5.35) 
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where i is an arbitrary index of a such that a; 4 0. B" can be defined by the Singular- 
Value Decomposition of a (which can be calculated using the methods described in [40], and 


is implemented in many numerical libraries): 


a=Uov , (5.36) 


where U is an orthogonal d x d matrix, o is a d-dimensional column vector such that o2.4 = 0, 


and v is either 1 or -1. This implies that 


a= U.i01v “ (5.37) 


As U is orthogonal, it follows that the columns of U.9.q are orthogonal to each other and to 


a. So we can take U.9.q as our rank d — 1 subspace basis B. 


5.3.3. Sampling Phase Offsets and Canonical Grid Similarity Trans- 


formations 


When constructing a deformation model, a choice has to be made about what representation 
of the sample data to base the model on. There are many possible choices, and it is not 
immediately clear what choice would yield the lowest-dimensional model. In our work we have 
tested two possible choices, which will be described in the next section. First, we will discuss 
the two types of canonical grid we defined to go with these choices, and we will show how we 


sampled the similarity transformations that we used to define the sample sets. 


The two types of canonical grid we used were: a zero-centred regular grid, g;, with | x | vertices 
evenly distributed over the unit square; and a zero-centred quasi-circular grid, g., of unit area, 
defined by r uniformly-spaced concentric rings (each of which is a regular s-sided polygon, 
except for the innermost ring, which is degenerate) with s uniformly-spaced spokes delineating 
s similar isosceles-triangle-shaped segments of the outer ring. Typical examples of the grids are 


given in figure 5.5. We will use g when discussing things that apply to both grids. 
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We use a zero-based 2D index (i, 7) to refer to the vertices of each grid, where 7 and j identify 


the row and column respectively of g,’s vertices: 


% 1 
gs(i, 7) = —— } (5.38) 
l-1 ay es 


Ct ne ae (5.39) 


The function tr, ,(a) gives the radius of the regular s-sided polygon of area a, and it is used here 
to give g, unit area. It follows from elementary trigonometry that an isosceles triangle with 


2 


apex angle = and side length (i.e. outer ring radius) / has area I? cos (4) sin (4). Multiplying 


by s, making use of the double-angle formula and rearranging gives t, ,(a): 
te 0) 4) (5.40) 


As with the grid-based deformation fields, each vertex (i,7) in g, is connected to its four (two 
or three at the boundary) neighbours (7 — 1,7), (¢+1,7), (¢,7 —1) and (7,7 +1), and similarly, 
each vertex (7,7) in g,’s interior is connected to its four (three at the boundary) neighbours 
(¢-—1,9), @+1,9), GG —1) (mod s)) and (,(7 + 1) (mod s)). We model deformations by 
considering the motion of the vertices, and again, we approximate the motion at all other points 


within the grids by bilinearly interpolating over the quadrilaterals defined by the grids’ edges. 


As mentioned earlier, we need to randomly sample phase offsets and similarity transformations 
(o,f) for the canonical grids over a domain 2 that has bounds determined by the constraint 
that f(g) must lie within v;,4.’s image S,. We can draw uniform samples from Q by rejection 
sampling. As f is determined by a 2D displacement p, a grid area a and a rotation ®, the 
simplest form of rejection sampling would involve defining a 5D cuboid 2* D 2, and repeatedly 
drawing samples from it (sampling from each dimension independently) until we draw one that 


satisfies the constraint. But we can sample more efficiently by making use of some geometric 
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-0.6 -0.4 -0.2 0 0.2 0.4 0.6 


(a) gs, 1 = 10 (b) gc, r = 6, s = 20 


Figure 5.5: (a) The 10 x 10 square canonical grid that we used in our experiments, with 
2x 10 x 10 = 200 degrees of freedom. (b) A 6 ring, 20 sector quasi-circular canonical grid that 
we used. The 20 points of the degenerate innermost ring will always be collocated after any 
transformation, hence the grid has 2 x (5 x 20 + 1) = 202 degrees of freedom. The two grids 
have equal area and a similar number of degrees of freedom. 


constraints. 


The first constraint, which holds for both canonical grids, is that for any o, the displacement p 
must lie within S,. By making use of the Delaunay triangulation of the landmarks that define 
So, we can sample p within S, by selecting a triangle Away with a probability proportional 


to its area*, and then sampling a point within it. As explained in [117], one way to sample 


uniformly over Away is to begin by drawing a sample p’ over the parallelogram Dwazy, 


where z = # + wy (sce figure 5.6a): 
p’ = w +Upwa + U wy , (5.41) 


for i.i.d. standard uniform variables Up and U;. Then we set p to p’ if it lies within Away. 


Otherwise, p’ lies within Avyz, and we map it to a point p in Away using a linear mapping 


\Zp’| _ |p 
that maps z to w, keeps the line through a and y fixed, and that satisfies sa = aq 


*Note: when calculating our results at the end of this chapter, we mistakenly selected the triangles with 
uniform probability instead. In theory, this would have had the effect of biasing the sample set towards the 
deformations around small triangles. However, our cross-validation results in figure 5.14 on 167 showed that 
the deformation mode models performed more-or-less as well on images that they were not trained on as they 
did on images that they were trained on, suggesting that the biasing effect was insignificant. 
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W 
(a) 
Figure 5.6: (a) A point p’ uniformly sampled within parallelogram Hwazy can be linearly 
mapped to a point p in triangle Away, giving a uniformly-distributed sample within that 
triangle. (b) After selecting 0, p and %, the smallest distance u from p to S,’s boundary 
determines the upper bound on the scale factor a. 


where q is the intersection of the line through z and p’ with the line through a and y: 


ed 
wt 1 ed 


qa xt vay , (5.42) 


and the scalar v is the parameter such that the line a + vay intersects the line through z and 


/ 


p’: 

eta Se keep! = (o’z | ay) "|| =e « (5.43) 

v 

The second constraint is that g’s boundary must lie within Sy. We bound the area a above at a 
manually-selected value at that ensures that f(g) cannot contain S,. So the second constraint is 
satisfied when none of the line segments from p to f(g)’s boundary vertices cross S,’s boundary. 
For gs, after selecting o and p, we randomly sample a direction v = (cos(),sin(®))™ for the 
vector from p to one of the boundary vertices by sampling the orientation ® over |0,27). Then 
we take the upper bound on a to be at most 2u” (the area of a square with diagonal length 
2u), where u > 0 is the smallest radius such that one of f(g,)’s four corners, p + uv, p — uv, 
p + u(—vy,vx)? or p— u(—vy, vx)7, intersects S.’s boundary. We search for u by calculating 
the intersections of the lines through p and each of these corners with the line through a and b 


(given by the solutions of simple linear equations analogous to eq. (5.43)) for each line segment 
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from a to b on S,’s boundary. We also bound a below at a value a‘ that defines the area of 
the smallest grids that we are interested in modelling the deformations of. So if 2u? < a‘, we 


reject p, o and © and start again. Otherwise, we sample a over [at, min(2u?, a*)]. 


A similar method could be used to sample a for g,, but for large s it is more efficient to work 
with the smallest circle centred at p that contains g, for any given value of a. Hence, we take 
the upper bound on a to be min(mw’”, a‘) and reject the current samples if ru’” < a‘, where u’ is 
the smallest perpendicular distance from p to S,’s boundary. This only requires n intersection 
calculations, as opposed to the sn calculations required by the previous method, where n is 
the number of line segments that make up S,’s boundary. Working with circles means that the 
value of the orientation parameter ® has no bearing on the issue of whether or not S, contains 
f(g.) (for fixed o and p), but it also has the effect of trimming away some parts of Q near the 
boundary of S, (for each value of 0). However the (s = 20)-segment grid we performed our 
experiments with is so close to being perfectly circular (it occupies about 98% of the area of 
a circle with the same radius) that the volume of 2 that is lost will generally be very small in 


comparison to its original volume. 


Although rejection sampling like this gives uniformly-distributed (o,f) samples, the marginal 
distributions of a produced by these methods tend to be heavily skewed towards small a, since 
each S, can contain more small grids than large grids, and in some cases there were large changes 
in the area of S, over the deformation field sequences (e.g. see the “180608.HS” row in table 
5.4). Rather than using uniformly-distributed (o,f) samples with these skewed a marginals, 
it is more important to us to generate samples that have a uniform marginal distribution 
over a, and a conditionally-uniform distribution over 0, p and ® given a. This should make 
for fairer comparisons of the performance of g,-based models and g,-based models (since the 
marginal distributions over a would probably be different otherwise), and it should make the 
models attach as much importance to the kinds of deformations that mostly only occur at large 
scales as they do to those that mostly only occur at smaller scales. So, to make a’s marginal 
distribution more uniform, we partition [a‘,a‘] into m subsets of equal size, and then sample 
m groups of x grids, taking the bounds of each subset as a’s sampling range for each group 


(as in algorithm 5.1, N is the total number of grids to be transformed, and we assume that m 
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divides NV). 


Finally, before f’s parameters can be sampled, we have to choose a value for o. The only 
difficulty here is that of determining whether or not we have selected an o for which any f(g) 
lies in S,. We use a probabilistic argument to infer whether or not this is true. For each (o,f) 
sample we wish to draw, we initialise o’s domain to the set X = {0,...,7’—1}, draw a sample 
o’ from it and attempt to sample f’s parameters as described above. If we are unable to sample 
f before M rejections are made, we remove o’ from X, and begin again. Now, for any o’ such 
that S. contains some f(g)s, the number of rejections that will be made before a contained 
f(g) is found follows a geometric distribution. The probability of finding an f(g) in at most M 
trials is P = 1 — (1 — P’)™, where P’ is the probability of finding f(g) in one trial. Although 
P’ is an unknown constant that depends on the size and shape of Sy’s boundary, P tends to 1 
exponentially in M for all P’ > 0. So by making M sufficiently large, we can be confident that 
failure to find any f(g) after M trials means that there are very few similarity transformations 
f such that S, contains f(g). The most interesting cases are those where P’ is small, which 
occur when the size of S.’s boundary is similar to the range of sizes of the boundaries of g that 
we are sampling over. In our tests we used M = 750, which leads to a 99.95% probability of 


success within M trials when P’ is 1%. 


5.3.4 PCA-Based Deformation Mode Models 


Once we have constructed a set of similarity-transformed canonical grids f(g) and warped them 
with the deformation fields as described in algorithm 5.1 to form sequences g7,, the next steps 
are to calculate the affine registration functions h; that remove the affine components from each 
g, and then to model the principal modes of residual deformation. The temporal ordering of 
the g/s is unimportant as far as modelling the deformation modes is concerned, so we will drop 


the time subscripts for most of the rest of this section. 


Many approaches to deformation modelling have been proposed in the shape analysis literature, 
and the closest to our work is the Point Distribution Model (PDM) proposed by Cootes et al. 


in [22], which forms the basis of other important models of deformable objects, such as the 
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Active Appearance Model (AAM) [23]. The PDM in its original form is not quite applicable 
to our task however, as its training shapes must be defined in terms of landmarks whose 
correspondences across multiple shapes can be identified. In contrast, we need to be able to 
model the deformations of any visible part of the myocardium (as we explained earlier), and we 
do not have a reliable way of identifying corresponding 2D regions of the myocardium across 
patients (nor is it clear whether or not such correspondences can even be meaningfully defined 
for all parts of the myocardium). In addition, in the training phase of the PDM, only similarity 
transformations are removed from the training data, which means that the PDM learns skewing 
information. Again, as stated earlier, we do not want our deformation mode models to encode 
any information about the affine components of the deformations that the deformation fields 


induce on f(g), since we already have a parameterisation of affine transformations (eq. (3.1)). 


We will begin by treating each g = h(g’) as an instantiation of a d-dimensional Euclidean 
random vector g, with each component of g corresponding to an ordinate of one of g’s vertices. 
For this representation of g, we use the unconstrained affine registration procedure described 
in section 5.3.2 to calculate h, and we assume that there is an orthogonal change of basis under 
which g’s components undergo independent standard Gaussian variations, i.e. we assume that 


g satisfies 


E=p—Ugt VA2 ; (5.44) 


where {lg is g’s mean state, V is a matrix of orthonormal basis vectors, A? isa diagonal 
matrix denoting the standard deviations of (g — fg)’s projections onto V’s columns and x and 
€ are vectors of i.i.d. standard Gaussian random variables. We use the usual convention of the 
standard deviations decreasing monotonically from the top-left element of A? to the bottom 
right element, so that for any c, the first c columns of V denote the c directions in which g 
varies the most — the principal deformation modes. Examples of the principal modes for this 


Euclidean representation of g are given in figure 5.15 on page 168. 


Our intention, in splitting the Gaussian variables between the n-dimensional subvector x and the 


(d—n)-dimensional subvector e€, is to capture pure noise in g caused by errors in the deformation 
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al 
field estimation process in the subspace fg + V-nii-:dAZ1y.qny11-q€ and variations in g mostly 
g ) nt+l:d,n+l:d 


u 
due to the true deformations in the orthogonal subspace fg + ViinAdininX This way, after 
estimating fg, V and A3, we will be able to discard the noisy subspace, leaving a cleaner, 
lower-dimensional model of g expressed in terms of x. Note that due to the orthogonality of 


these subspaces, this equation is equivalent to the factor analysis model: 
1 1 
&8 = Hg oi Vin Atin in a VintidAnstanstid€ ; (5.45) 


in which x is called the common factor, or latent variable. We will use the latter term. More 


general factor analysis problems with Gaussian latent variables are discussed in [62]. 


This model of g’s distribution is closely related to the Gaussian factor analysis model with 


isotropic noise analysed by Tipping and Bishop in [111]. Their model has the following form: 
g=—uU,g+Wy-+e’ , (5.46) 


where fg + Wy captures the noise-free variations in g with a matrix W, which is to be 
estimated from the data, and a k-dimensional standard Gaussian random vector y. €’ is a zero- 
mean isotropic Gaussian noise vector with variance o? in each direction. In that paper, it was 
shown that the maximum-likelihood estimate of this model’s parameters is given by a minor 
variant of the popular PCA (Principal Component Analysis) method (used for example by 
Cootes et al. to define the PDM, and discussed in a model-free context in [54]). In particular, 
if we let G be an M-column matrix, each column denoting a sample of g, and we use g and 
C' to denote the maximum-likelihood estimates of wg and g’s covariance matrix &, which are 
given by 


M 
1 1 
g- G.,, C=—GG' -gg' 5.47 
9 G5, TW; 99 (5.47) 
then the maximum-likelihood estimates o’” and W’ of o? and W, respectively, are: 


d 
1 1 
=o DL A Wl = Vira M are — 0°)? (5.48) 


i=k+1 
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where 


C=V'A'v" (5.49) 


is the eigendecomposition of C’. 


The main difference between this model and the one we gave in eq. (5.44) is that while our 
model does not account for errors in the subspace of the latent variables, it does account for 
anisotropic errors in the orthogonal subspace, whereas the Tipping and Bishop model accounts 
for errors in all directions, but assumes them to be isotropic. In practice, some noise will of 
course remain in the latent variable subspace with our model, and so in section 5.4 we will 


consider more effective ways of filtering out the remaining noise based on temporal smoothness 


assumptions. But for now, by setting 0” = 0? = 0, k =d and y = (x',e")", it follows from 


Tipping and Bishop’s work that the maximum-likelihood estimate of V and A? in our model 


are also V’ and A’, which is the solution given by standard PCA. 


It is worth noting that the standard maximum-likelihood estimators of uw, and 4 given above 
assume that the samples G. are independent. Despite the gs in each non-affine sequence gr, 
that we use to define G not being independent of each other, the cross-validation tests we 
performed in section 5.5.1 suggest that it is acceptable to use all of the gs in each sequence 


when estimating these parameters. 


After calculating g, V’ and A’ and selecting the number n of principal components to use, 
denoised estimates g of each instance g of g are easily constructed by projecting g into x’s 


subspace, and then mapping that projection back into the original space: 


g — g =F VW an(W’ san)" (g —_ g) : (5.50) 


We will discuss the selection of n in section 5.5. 
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5.3.5 Polar Coordinate PCA-Based Deformation Mode Models 


The problem with applying PCA to this Euclidean representation of g is that it is difficult 
to justify the assumption that g’s ordinates exhibit linear dependencies on each other and 
Gaussian variations. Since the principal modes of the sample covariance matrix define a basis 
over the space of g, we can always reconstruct each instance of g as accurately as we like by 
using more modes. But it is possible that non-linear methods will lead to more compact models, 


that require less parameters to construct g to any given degree of accuracy. 


Numerous non-linear alternatives to PCA have been proposed in the literature, such as: ker- 
nel PCA [95], which exploits the “kernel trick” to map data samples to a higher-dimensional 
(potentially infinite-dimensional) space in which the dependencies are closer to linear, so that 
performing standard PCA in this space yields a more efficient representation; Locally Lin- 
ear Embedding [88], which locally reconstructs data samples as linear combinations of their 
neighbours, and then uses eigenanalysis of a matrix defined in terms of the linear combination 
coefficients to find a lower-dimensional, neighbourhood-preserving parameterisation of the data; 
and Isomap [108], which uses the Euclidean distances between neighbouring data samples to 
determine the geodesic distances between all pairs of samples along the manifold that they are 
implicitly embedded in, and then uses eigenanalysis of a matrix defined in terms of the geodesic 


distances to calculate lower-dimensional representations. 


The main problem with these and other related methods is their time and space complexity. 
Whereas standard PCA only requires two iterations over the M data samples, followed by 
the O(d?)* eigendecomposition of a d x d covariance matrix, where d is the dimensionality of 
the sample space, all three of these methods involve computing eigendecompositions of M x M 
matrices. Our tests at the end of this chapter are based on two image sequences; one is 51 frames 
long, and the other is 59 frames long. For each test, we apply 1000 similarity transformations 
to the canonical grids, which means that we use M = 51000 and M = 59000. Storing M x M 
matrices for such large values of M would require 20-28GB of memory at double floating- 


point precision, which is far more memory than we have. And even if we could store such large 


*When using the symmetric QR algorithm in [40]. 
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matrices, the O(M?) eigendecompositions would be infeasible to carry out on current hardware. 


Interestingly, [58] suggests a solution to the problem of computing the eigendecomposition for 
large M in the context of kernel PCA based on the Generalised Hebbian Algorithm, which 
was introduced through the work published in [75] and [92] as a way of training linear neural 
networks to find the eigenvectors of the autocorrelation matrix of their input samples. It may 
perhaps be possible to adapt this to other dimensionality reduction algorithms, and stream 
sections of the M x M matrix into memory as they are needed, but this would be a rather 


complex undertaking, and would probably still be very slow. 


In light of these problems, we decided to test the simpler alternative of explicitly non-linearly 
mapping the vertices of the registered grids g = h(g’) to another space, applying linear PCA in 
this space, and then comparing PCA’s performance in this space to its performance in Euclidean 
space. One of the simplest non-linear mappings to try is one in which we map the vertices of the 
gs defined by deformations of the quasi-circular canonical grid g, to a polar coordinate space. 
We use this mapping under the assumption that the myocardial surface can be modelled as a 
surface composed of small regions that undergo local rotations and stretch radially away from 
or towards their rotational centres. So for this space, we use the constrained affine mapping 
defined in section 5.3.2 as the affine registration function h, enforcing the constraint that h 


should map the centres of the deformed grids g’ to the origin, so as to yield a consistent 


mapping. 


After computing the hs, we represent each registered g with a 2sr-dimensional vector g (s and 
r are the number of sectors and rings, as before). Given some ordering (rj, s;) (i € 1,..., sr) 
over the (ring,sector) indices of g’s vertices, we set g’s even ordinates to the radial distances of 


g’s vertices from the origin: 


92 = llg(ra si) , (5.51) 


and each odd ordinate gj, _, to a scaling factor k; multiplied by g(r;,s5;)’s angular deviation 
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from the mean direction of the i*” vertex of all gs in the data set: 


0 ,; i= 0 
9x-1 = (5.52) 


k, arcsin (9°(ri, 81) x oe) , otherwise , 


where 


1 
Ps 84) Sk “(rj, 83) X == 5.53 
Ilo" (ri, 84) Gg (Ti; 84) TIGsl > 5 felt 8) (5.53) 


geGs t=1 ae 


The space that these gs reside in can be considered to be locally Euclidean, provided that the 
angular deviations lie within the range (—}, 5) radians (in our tests they were well within these 
bounds). So, taking each g to be an instantiation of a Gaussian random vector, we use linear 
PCA to construct deformation mode models in this space, exactly as described in section 5.3.4. 
Since PCA identifies directions of maximum variance, it may not give meaningful results when 
using data sets that consist of different types of measurements (such as lengths and angles). So 
we use the factor k; to scale the angular deviations to a magnitude comparable to that of the 
radial lengths. We tried two different scaling methods, both based on the fact that the length 
of a circular arc that subtends an angle ¢ radians is given by /¢, where / is the arc’s radius. 


For the first method, we scaled all angles uniformly, setting k; to the mean radial distance of 


the vertices from the origin, calculated over all vertices of all gs: 


1 : 
= aT Gal Ss" 5-3 lol Ti, 54) Vi. (5.54) 


(555. 9EGg t=1 i=1 


This preserves the fact that under the polar coordinate system, angular deviations of 64 in 
two vertices at different distances from the origin are considered to be equally significant. The 
second method comes from a suggestion in the literature, which we will mention at the end of 


this section. 


Denoised estimates g of g calculated as in eq. (5.50) will of course be in polar coordinates, but 
for analysis we prefer to convert each (radial length, scaled angular deviation) pair (1;, kjd4,;) 


back to Cartesian coordinates, so that we can compare the results with those of the previous 
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Figure 5.7: The ring 3 vertex of the spoke of g denoted by the solid black line is at a distance 
! from the origin, and it rotates away from the corresponding mean directional vector (shown 
in grey) by dg radians. 


standard PCA approach. We perform the conversion as follows: 


0 , 75 =0 


= cos(dg,i;) — sin(dg,i) (5.55) 
y iE g*(ri,5;) , otherwise . 


sin(dgi) cos(d¢,;) 


We are not by any means the first to use polar coordinates in PCA. It was used, for example, 
by Heap and Hogg in [48] in a hybrid shape model for articulated objects like human bodies or 
anglepoise lamps. In their model, collections of points with strong rotational interdependencies 
were represented in polar coordinates, and the other points were represented in Cartesian 
coordinates. Besides the fact that we use constrained affine transformations to register our 
shapes rather than the more common Procrustes method that they use, the main difference 


between our use of polar coordinates and theirs is to do with the kind of angles they measure. 


As their model was designed for articulated objects, they needed to identify pivots c within 
each object. For each object point p that rotates about c, rather than measuring p’s angular 
deviation from a fixed reference direction as we do with g*, they measured the angle between 
Cp and ¢& for some reference point a that is also on the object. E.g. if p is a point on a human 
wrist and c is on the elbow, then the reference point might be chosen to be somewhere on 
the shoulder, and the measured angle would be roughly equal to the angle between the upper 


arm and the forearm. Measuring angles like this means that the order in which points are 


146 Chapter 5. Deformation Modelling 


reconstructed from linear combinations of principal components is important; before p can be 
reconstructed, its reference point and c must be reconstructed. In our case however, such an 
angle measurement scheme would have created a cycle of dependencies around the vertices of 


each ring, making reconstruction an underdetermined task. 


The second method for calculating k; that we tried was the method that they suggested, of 


multiplying the angle measured for each (p,c) pair by ||p — ¢||, which in our case gives 
ki = |le(ri, si) (5.56) 


They chose this scaling method because they considered it to be important to preserve the fact 
that if p’ and p both rotate about c by the same angle, then the point which is furthest from 
c moves the furthest. We will present results using both scaling methods at the end of this 


chapter. 


5.4 Deformation Trajectory Mode Modelling 


Now that we have described our models of the static poses that we can expect the canonical 
grids to be in as a result of the non-affine components of myocardial deformation, the final 
stage is to incorporate these models into a model that takes into account not only the temporal 
dependencies between grids that define a sequence of deformations, but also the dependencies 
that exist between the affine components of the deformations and the non-affine components. 
The fact that there are strong dependencies between the affine and non-affine components 
should be clear from considering, for example, how elastic surfaces undergoing local twisting 
often also undergo some rotation, or how elastic surfaces that are stretched tend to thin in 
the middle. So we will begin this section with a formal definition of what we mean by a 
“deformation trajectory model”. We will then make their link to the deformation mode models 
of the previous section clear, and finally we will describe the specific types of deformation 


trajectory model that we have constructed. 
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In the previous section, we considered the affine registration functions hr, and the non-affine 
canonical grid deformations gr, to be deterministic, since they were fixed given: a deformation 
field sequence vr,, a random similarity transformation f and a random phase offset o. We 
will now change our perspective by discarding these dependencies, so that the deterministic 
tuple (hr,,gr,) | (vr,,f,0) becomes the marginalised random tuple (hrt,, gr,) that is the result 
of applying the sampling procedure described in section 5.3.3 to a deformation field sequence 
uniformly sampled from a training set V. Extending the previous section’s notation, we will 
use the random reference frame t,, random cardiac cycle duration T and random frame list 


T, = [t,,...,t--+ T — 1] to denote the frames over which (hr,, gt,) is defined. 


By the laws of large numbers, as we increase the size of V by defining deformation field se- 
quences over more patients, the empirical distribution of the samples we draw of (h+,, gt.) 
should approach the true distribution over all patients of the sequence of non-affine and affine 
deformations that may occur in a region of the myocardium covered by a randomly similarity- 
transformed canonical grid (provided, of course, that the errors in the deformation field esti- 


mation procedure are not too large). 


We will use (HG) to denote the space over which the physically plausible instances of (ht, gt) 
vary (over all patients), for any frame t, and we will refer to a particular tuple in this space 
as (h,g). In addition, we will also use (HG)7 to denote the space of all physically plausible 


random sequence tuples (ht,, gr,)- 


5.4.1 Deformation Trajectory Models 


Let (Qn, (-,*)g) be an inner product space, where Q, is a vector space of continuous, periodic 


paths q : R > R” through Euclidean n-dimensional space, each with unit period, and (-,-)g is 
an inner product on that space. We require the degree of parametric continuity of each path 
at integer parameters to be no less than the minimum degree of parametric continuity at all 


other parameter values, so that the join between consecutive cycles will be as smooth as the 
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rest of the path. So each q € Q,, satisfies 


= (5.57) 


for all s € R and integers k € {0,...,k’}, where k’ is the smallest integer such that for some 


s € (0,1), 
d’+1q(t) 


Ae ls (5.58) 


is undefined. Note that k’ may be infinite if the paths are infinitely differentiable everywhere. 


We will use the paths of Q, to model the sequences of affine and non-affine deformations 
that the similarity-transformed canonical grids undergo. The periodicity and differentiability 
requirements ensure that we can begin each sequence at any phase without encountering sudden 


“jumps” in the middle. 


Let Y* be a continuous mapping from R” to some superset of (HG).We require T* to be 


one-to-one, and hence to have an inverse, which we will call Y. Thus, Y(Y*(a)) = a for 


all 2 € R". YT need not be one-to-one, but we require T+ to be its pseudoinverse (in the 


sense that for some appropriate measure of distance, T*(Y(h, g)) = (h’,g’) is the closest tuple 
to (h,g) in Y*’s image). Y is the function we will use to perform an initial noise-filtering 
dimensionality reduction on (h,g), and so it may discard information about its argument, but 


the reconstruction function T* must, of course, preserve all information about its argument. 


We model the sequence (ht,, gt,) as a noisy sample of T*’s image of some random path q € Q,, 


(hi, Bt) = (x* (< (‘ =*))) (5.59) 


where the é,,...,€,4T-1 are random perturbation functions that result from errors in the 


in the sense that 


estimation of the deformation fields and deviations from periodicity due to unmodelled factors 
such as the influence of respiratory motion. Whatever model is chosen for the distribution of 


the €,, we require them to satisfy 


Bitte) |a=al=1* (a(S*)) (5.60) 
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so that they have no effect on average. 


Definition 5.1. Given Q, and Y* for which there are random paths q € Q,, that induce the 
distribution of the (hy,,g1,) up to observational noise in the sense of eq. (5.59), we call each 


particular instance q of q a deformation trajectory. 


Definition 5.2. A deformation trajectory model (DTM) is a tuple 


(Qn, Giga l ce; Be) ; (5.61) 


where: 


© On, (5) and T* are defined as above; 
e =o is a probability space over Q,, defining the distribution of the deformation trajectories; 


e =, is a probability space defining the distribution of the observational noise functions €¢. 


Once a DTM has been chosen, the main tasks are to infer the parameters of =g’s probability 
measure from our noisy observations of (ht,, gr), and then to analyse the principal deformation 
trajectory modes of the parameterised distribution (i.e. the principal ways in which the observed 
(hy,, r,) Sequences vary). We shall make all of this concrete in the following sections, in which 


we will consider a simple, but effective, type of DTM. 


5.4.2 B-Spline Inner Product Spaces 


B-splines curves provide a particularly convenient model for the multidimensional paths of a 
DTM. We shall describe their properties in sufficient detail to make our use of them clear, but 
for more general information about them, we refer the reader to one of the many published 


references, such as [26] and [81]. 


Let Sc be the vector space of periodic B-splines in R” for which every B-spline: has the same 


degree d > 1; has the same number of control points c > 2d; is defined on the uniform sequence 
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of c+d+1 knots: 


et 
i =——, Pere eta Rie, (5.62) 
C= 


so that tg,,; = 0 and t.,; = 1. We enforce periodicity by requiring the first d control points 
to be equal to the last d. These requirements mean that all B-splines in the space are defined 
on the interval [0,1], have order d — 1 parametric continuity everywhere in it, and, by virtue 


of their periodicty, can be evaluated at all real parameters by taking the parameters modulo 1, 


as in eq. (5.57). 


Sn,c,4'8 B-splines are uniquely defined by nc-vectors, which we shall refer to as the trajectory 
vectors. Each trajectory vector p is the vector form of an n x c matrix [p], the columns of 


which define the B-spline control points: 


P, “tt De 
Pew vot Poe 
tp] & (5.63) 
P(n-1)e+1 °°" Pre 
For any given p, the B-spline q(t € [0, 1]; p) is defined as 
q(t; p) = [plb(t) , (5.64) 


where b(t) = (bai(t),...,bac(t))* is a vector of degree d B-spline basis functions, which can be 


defined by the recursive Cox-de Boor equations: 


Ls se Lett) 


bo i(t) = 
0 , otherwise, 
t—t,; litnsi — Ft 

bn a(t) = 5 —bn-rilt) + 5 gE —bn—sisi(t) (5.65) 
ttn % t+n41 ~~ “+1 


It is trivial to show that S;,-q is a vector space. It suffices to consider the fact that linear 
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combinations of any two B-splines q(-;p) and q(-; p’) in Sy.c4 satisfy 


aq(-;p) + Bal; p") = alp]b(-) + Blp'}b(-) 


= (a[p] + 6[p'])b() (5.66) 


which is the B-spline defined by the trajectory vector ap + Gp’. Hence, all of the vector space 
axioms can be expressed in terms of the trajectory vectors (or equivalently, the control points), 


which themselves obviously define a vector space (more specifically, they define a subspace of 


R”"°, since the first d control points are repeated). So for some choice of c and d, we can set 


oO, = Yn,c,d- 


B-Spline PCA 


We will now momentarily side-step the question of what probability distribution to impose 
upon S;,-,q and consider how PCA may be extended to the analysis of a set of B-splines in a 


distribution-free setting. 


Whatever inner product (-,-) is chosen, the problem of identifying linear principal components 
can be expressed in much the same way as the usual distribution-free finite-dimensional Eu- 
clidean formulation. In particular, given a set of m sample B-splines q(-;p!) € Shc4@ with 
sample mean q(-) = q(-;p), where 


p= 


oes (5.67) 
i=l 


the first k principal components are functions 0"! : [0, 1] + R” that each maximise the variance 


= ae 


of 


(981, q(- pil) — qa) (5.68) 


over all 2, subject to the orthonormality condition 
(9!) 90) =6.;, V(i,j)€(1,...,k}P - (5.69) 


The B-splines can then be reconstructed (to a degree of accuracy that increases with k) by 
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calculating linear combinations of the gu), 


q(t:p") © a(t) + >> (8, asp") — a) we) . (5.70) 


Defining W~ as an n(c— d) x nc block diagonal matrix that discards the duplicate trajectory 


vector dimensions, i.e. 


xn 
—— 
W- 4diag( W- ..-- W- ) 
w-= ( Ica) O(c—d)xd ) (5.71) 


a simple choice of inner product is the Euclidean inner product p’ TW-'W-p, for arbitrary 
trajectory vectors p and p’. For this choice, the above method reduces to mapping each sample 
B-spline q(-; p"!) to the point W~p", and performing PCA on these points as described earlier. 
The j*" principal component will be a B-spline in S;,,.q with a trajectory vector v" defined such 
that W~ vl is the j** largest eigenvector of the sample trajectory vector covariance submatrix 
C™*, defined in terms of the full covariance matrix C' by removing duplicate rows and columns, 


1.e.: 
C*=W-cw-" , 


i aie i 
G22 yy ghsdt_ ser. 5.72 
mole pl? — pp (5.72) 


The remaining dimensions of v!! are defined by duplication, to make the corresponding B-spline 


periodic. 


A more common choice of inner product for vector-valued functions over [0,1] is the sum of L? 


inner products over the dimensions of the range, which for B-splines in S;,.4 is 
mpl 1 
A 
(a(-;P),4(5P’))i2 = be gi(t; P)ai(t p’)dt = i: a(t; p)' a(t; p')dt . (5.73) 
i=1 70 o 


This inner product was used with more general functions in [18, 85], and the latter of these 
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publications gave a closed-form method for computing the eigenfunctions 9”! when the sample 
functions have a finite basis expansion. For periodic B-splines, the method involves convolving 
the columns of C* (with wraparound) with a kernel defined in terms of the basis functions b, and 
then performing PCA as normal on the result to determine the eigenvalues and the trajectory 
vectors of the eigenfunctions (which are B-splines). The resulting matrix is asymmetric, but 
the problem can be transformed into a more computationally stable symmetric eigenanalysis 
problem. The kernel is symmetric and decreases away from its centre. For the degree 3 B- 
splines that we use, it is also very narrow and about half of its total weight is concentrated at 
its centre, so the resulting eigenfunctions should not be very different to the result given by the 


Euclidean inner product above. So we use the simpler Euclidean inner product instead. 


5.4.3 A B-Spline-Based Gaussian DTM 


Taking S_c,q as Q,, and the Euclidean inner product of the control points as (-,-)9, we must now 


consider the construction of the space R” that the deformation trajectories reside in. Under our 


model, the path that each deformation trajectory follows through this space is a representation 
of some continuous sequence of affine and non-affine transformations that we can only observe 
indirectly at discrete time points in the form of the noisy (ht,, g1,) tuples. So for the Euclidean 
inner product over the trajectory vectors to be a meaningful choice of (-,-)9, the space of the 


representations of the transformations that we choose must be as close to Euclidean as possible. 


Our assumption in defining the different types of deformation mode model that we considered 
earlier was that at least one of the types would be a reasonable approximation of a Euclidean 


space of non-affine deformations. So given a deformation mode model that reduces the number 


of dimensions of each non-affine grid g; to m, we define the first m dimensions of R” as the 


subspace of projections of the g; onto the first m principal components of the deformation mode 
model, turning the DT'Ms we create into nested principal component models. We will use gr, 


to denote these random m-dimensional projections, and g7. for specific instances of them. 


As we mentioned in chapter 3, matrix logarithms map matrices to locally Euclidean spaces, 


and so they are good candidates for approximately Euclidean representations of the affine 
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deformation components. For the application of the DTMs to deformation simulation that we 
will discuss in the next chapter, it will be more useful for the DTMs to give a representation 
of the inverted affine components ae since these map from the coordinate system of the 
canonical grids to that of the deformation fields. In addition, we will not need to model the 
translational components of the hae and we will only need to know the relative states of each 
pair of transformations in hz? In other words, for each instance hr, of hr,, it is acceptable 
for the DTM to give an approximation of (hj~' o h’,...,h{7/p_, 0h’) as a reconstruction of 
ee where h*~' represents the inverse of the non-translational component of h;, and h’ is an 
arbitrary non-singular affine transformation that may be different for each instance of hy,. The 
non-translational component of each 2D affine transformation h, is a linear transformation with 


4 degrees of freedom (c.f. the A matrix in eq. (3.1)) that we will represent as the 2 x 2 matrix 


[hy]. So we set n = m+ 4 and define the last 4 dimensions of R” to be the space in which the 


elements of the logarithms of these matrices reside. 


We can now define the mapping Y and its pseudoinverse T+. Y(h, g) returns an (m+ 4)-vector 
x such that 


Pi =|; (5.74) 


and 


Lm+1 Lm+2 


= log ({h*]) (5.75) 


Lm+3 Lm+4 


We refer to x as a deformation vector. Yt+t then maps x to the pair (h’,g’), where the 
non-translational component of h’ is given by the matrix exponential of the 2 x 2 matrix that 
Lm+1:m+4 represents, the translational component is zero, and g’ is the result of reconstructing 


Z1:m in the original space, as described in earlier sections. 


For each (ht,, gr,), the fact that we only need to know the relative states of the inverted affine 
components allows us to transform them into a coordinate system in which the magnitudes 
of their rotational components are no larger than the range of rotations that occurs in a 
reducing the likelihood of the matrix logarithms being complex (see appendix B.2). We can 


accomplish this by simply composing each h*~' with hy, for some t € T,. The choice of t is 
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somewhat arbitrary, so we choose one that endows each sequence 
(xl, 6 px tit l=al) = (T(he-+ o hy, g,),--- eR kae ae o hy, &t,+T-1)) (5.76) 


of deformation vectors with a “watermark” that allows us to estimate the phase offset o that 
was chosen when (hry,,gr,) was sampled. As we shall soon see, this watermark carries over to 
the deformation trajectories that model the deformation vector sequences, and hence to the 
deformation trajectories that we will sample from Q, when carrying out simulations in the 
next chapter, allowing the simulations to identify useful starting points in the deformation 
trajectories. This choice of t also makes the deformation vector sequences mutually consistent 
in the sense that it ensures that they lie in the same coordinate system, and hence have the 


same geometric significance. 


The watermark relies on the following facts. Firstly, it follows from eq. (5.17) that gi,4. is just 
the result of registering a similarity-transformed canonical grid f(g) to g, and hence g;,. = g for 
both of the registration methods that we have considered. Secondly, due to the quasi-uniform 
nature of the similarity transformation sampling process, the mean state of the non-affine grids 
g should always tend to g as the number of samples increases (see the top row of figure 5.15 


for evidence of this). Hence given gr, alone, we can identify o by searching for the o’ such 


[Ts tor]? 
? 


. 5 Cee 5 t 
that gi,40. = g, and given x!'s! alone, we can estimate o as the o” that minimises I[x lit 


since Xj:,. = & = O corresponds to the deformation mode model’s mean. This estimate also 


[tr+o"”] 


m2 $ F é ri 
t+o"l||", since Xt tn +4 i8 a vectorisation of 


minimises ||x! 


log (ae o ht i) =log(I) =0 . (5.77) 


We extend this notion to the deformation trajectories with the following definition: 


Definition 5.3. A value p € |0,1) is a candidate phase offset for a deformation trajectory 
q € QO, if and only if 


vo" € (0,1) (Ila(o)I? < llae)IP) (5.78) 


The final parts of the DTM that we have to define are the distributions =g and ©, over 
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the deformation trajectories and the observational noise, respectively. We assumed earlier, 
when defining our deformation mode models, that the gt, could be represented in a (locally) 
Euclidean space in which they were corrupted by Gaussian noise, and that mapping them to 
the g;, vectors had the effect of removing some of this noise. Extending these assumptions to 
include the affine components, we assume that each (hr,, gt,) has a quasi-periodic subtrajectory 
(ht:, gt), where 


Te [tyackyte bY =], Oe Ps (5.79) 


such that each time point t of the corresponding deformation vector sequence x!'! (defined as 
in eq. (5.76)) satisfies 


t—t, 


ae (a ») +e, (5.80) 


for some random trajectory vector p and zero mean Gaussian random vector el. We allow the 
last T — T’ time points of (ht,,gt,) to be discarded because: a) we only approximately estimate 
the cardiac period when choosing the number of frames to define the deformation fields over; 
and b) due to the effects of respiratory motion and the non-uniform contraction of the heart, 
the times at which different regions of the myocardium return to states that are closest to their 
time t, states vary. So we calculate T’ by searching for the time step near the end of T, at 


which the deformation vector sequence comes closest to its initial state x'!: 


ees ge xe 2 


(5.81) 


be arg min IIx! 
T"€{[0.75T],.... 1-1, 7} 


If we make the simplifying assumption that the e; are i.i.d. vectors with diagonal covariance 


matrix D = diag(o?,...,0?,,,), so that 
1 ty 
e, ~ N(e;0,%) x exp —5€ De EP (5.82) 


then the maximum log-likelihood estimate of the trajectory vector p coincides with the least 


5.4. Deformation Trajectory Mode Modelling 157 


squares estimate, since 


t+T’/-1 tenoih 


tr+T’-1 ee: 
=e in (x! —a( 7p) 0,2) 


t=t, 


tr+T/-1 m+4 [é] hag s 2 
gl tea | a 
oa » a ( ae se ”) + constant , (5.83) 
Oj 


t=tr i=1 


which can be maximised over each dimension of the B-spline independently, irrespective of 
the values of the standard deviations 7?. Reusing the control point duplicate removal matrix 
Seep —— 

W _, and defining WT as ac x (c—d) matrix that duplicates the first d control points of a 


(c — d)-dimensional control point vector: 


Ta 0 
Wt=! 0 Tc-2a) | > (5.84) 
Ta 0) 


ee 
the maximum log-likelihood solution is given by the control point matrix [p’};W W £ _ that 


minimises the squared error 


—~-T—~4+T_ 2 
|X—|[p|W W Bi,= 


eg ee aD 


I(* deer ) = pW W (00 bq) b(t) | 


By use of the Moore-Penrose pseudoinverse, as discussed in section 5.3.2, the non-duplicated 


control points [p’ Iw that solve this problem are given by: 
ip])W =x(W By(W BIW By). (5.86) 


Using this method to calculate maximum likelihood estimates of the trajectory vectors that 
explain each observed (ht,, gr.) gives a set of sample trajectory vectors that we can use to 


parameterise the B-spline distribution =g. We make the same assumptions about this dis- 
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T’ 


Figure 5.8: This figure shows an example of the result of fitting a B-spline to a quasi-periodic 
deformation vector subtrajectory. The crosses mark the subtrajectories of the first two principal 
components of the deformation mode model and of two of the matrix logarithm entries, and 
the solid lines of matching colours are the maximum likelihood B-splines that model them. 
The vertical red dashed line marks the time point fot and the vertical black solid line behind 
it marks the candidate phase offset calculated over the B-spline using a method that we will 
describe in the next chapter. The data comes from a test based on square canonical grids 
over the 230108.FO1 data set that our results in the next section are partly based on. The 


parameters we used were c = 23, d = 3, m = 40. 


tribution as we made about the distribution of the gs in section 5.3.4, namely that it is an 
nc-dimensional multivariate Gaussian distribution for which the nc — M principal components 
with the smallest eigenvalues capture pure noise, for some number M of principal deformation 
trajectory modes. The process by which we parameterise the distribution and calculate its 


modes is the usual linear PCA process, as described in that section. 


Figure 5.8 shows the typical result of modelling a quasi-periodic deformation vector subtrajec- 
tory in this way. The candidate phase offsets of the maximum likelihood B-splines tend to be 
close to Et, as shown in the figure. Discarding the last T — T’ time steps occasionally has the 
unfortunate effect of discarding time point t, but this did not happen often enough to have a 
detrimental effect on our use of the candidate phase offsets, which we will discuss in the next 


chapter. 


Another point that the figure highlights is that some of the matrix logarithm components of the 
deformation vectors were generally larger in magnitude than the components that came from the 
&,,. As PCA is sensitive to the relative scales of the dimensions of the vectors that it is applied 


to, it seems that it would be appropriate to rescale the matrix logarithm components. But it is 
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not clear how we might go about identifying optimal scaling factors efficiently (a trial and error 
approach in which we test the effects of different scaling factors on the DTM’s reconstruction 
accuracy would be impractically inefficient). Nonetheless, we were able to achieve reasonable 


results without rescaling, as the next section will show. 


5.5 Results 


We shall now present some results that demonstrate the accuracy with which the procedures we 
have described for constructing deformation mode models and DTMs capture the deformations 
in two training sets from two different patients, and the extent to which they reduce the number 
of dimensions of the sequences of canonical grid deformations. Our intention is not to present a 
comprehensive study in which we construct general models that could be applied to any patient 
(we do not have enough videos in which we can manually track sufficiently dense landmark sets 
for that), but rather to provide evidence for the kinds of results we could expect from a larger 


study. 


Table 5.4 provides a set of summary statistics for the two landmark sequences we used, and 
figures 5.11 and 5.12 show some frames from the sequences of deformation fields. The key thing 
to note is that there is a large change in the areas of the convex hulls of the landmarks for 
sequence 180608.HS, as shown in the table by the fact that the 25" area percentile is less than 
half the 100" percentile. This can also be seen directly in the figures of the deformation field 
sequences, and in the original video images shown in figure 5.3. Such large area changes did 
not occur in the 230108.FO1 sequence however, which meant that the hr, sequences of affine 


canonical grid registrations that we calculated were very different for each sequence. 


We used four different quasi-circular grid representations in our deformation mode models and 
DTMs, which we shall refer to as “Circular, Cartesian”, “Circular, Cartesian, Constrained”, 
“Circular, Polar” and “Circular, Polar, Unif. ang. scaling”. The “Cartesian” grids used the 
Cartesian coordinate representation of the grids that we discussed in section 5.3.4. Of these, 


we carried out the registration to the canonical grid using the constrained affine registration 
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method from section 5.3.2 for those labelled “Constrained”, and for those not labelled as such, 
we used the unconstrained method. The grids labelled as “Polar” used the polar coordinate 
representation. We scaled the angular deviations of those labelled “Unif. ang. scaling” uni- 
formly over all rings, and scaled the angular deviations of those not labelled as such according 
to the radial distances of the vertices from the origin. Using these four quasi-circular grid types 
allowed us to change each aspect of the representation in a controlled manner, so that we could 


see which aspects, if any, made a significant difference. 


All of our tests were based on sets of 1000 similarity transformations of the canonical grids, 
randomly sampled as described in section 5.3.3. We will use the term source frames to refer to 
the initial frames that the similarity transformations were sampled with respect to (i.e. t, +0, 
for deformation field reference frame t, and random phase offset 0), and source grids to refer 
to the states of the similarity-transformed canonical grids. To keep the tests as controlled as 
possible, we used one set of similarity transformations for each deformation field sequence over 
all four quasi-circular grid types. However, due to the square grids having a different boundary 
shape to the quasi-circular grids, the boundaries of the similarity transformation distributions 
for square grids differ to those of the quasi-circular grids, so we used separate sets of similarity 
transformations for them. We did however try to match the degrees of freedom in the square 
grids and the quasi-circular grids as closely as possible across all of our tests. The canonical 


grid parameters we used and the resulting degrees of freedom are given in table 5.1. 


We defined the bounds on the isotropic scaling components of the similarity transformations 
by requiring the areas of all of the source grids to lie in the interval [100?, 180] pixels”. As 
we explained in section 5.3.3, we ensured that the distribution of areas over this interval was 
approximately uniform by partitioning it into 20 subintervals and sampling 1000/20 = 50 
scaling factors within the bounds imposed by each subinterval. Table 5.2 gives the ranges of 
diameters and maximum edge lengths of the source grids imposed by this range of areas (by 
“maximum edge length”, we mean the maximum distance between neighbouring vertices in 
a grid). Comparing its “Max. Edge Length” column to the “Spc.” columns of table 5.4, 
we see that the resolutions of the canonical grids were much finer than the resolutions of the 


landmark sets — the distributions of the number of landmarks contained in the source grids are 
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Grid Type Params DoF 

Square 1=10 21? = 200 
Circular, Cartesian r=6,s=20 2((r—1)s+1) = 202 
Circular, Cartesian, Constrained r=6, s = 20 2(r — 1)s = 200 
Circular, Polar (=O. 8 —20 2(r — 1)s = 200 


Circular, Polar, Unif. ang. scaling r= 6, s = 20 2(r — 1)s = 200 


Table 5.1: The parameters we used for each type of canonical grid and their degrees of freedom 
(DoF). The parameter symbols are the same as those we used in section 5.3.3. As mentioned 
earlier, the innermost rings of the unconstrained quasi-circular grids have just 2 DoF, because 
they are degenerate. However the innermost rings of the constrained quasi-circular grids (in- 
cluding the polar grids) have 0 DoF, as their vertices always lie at the origin. 


Grid Type Diameter Range Max. Edge Length 


Square [141.4,254.6] [11.1,20] 
Circular [113.8,204.8] [11.4,20.5] 


Table 5.2: This table shows the ranges of diameters and maximum edge lengths of the source 
grids. The maximum edge length of a grid is the maximum distance between neighbouring 
vertices. The diameters of the square grids are calculated as the distances between diagonally- 
opposite vertices. Their range of side lengths was [100, 180]. 


summarised in table 5.3. Comparing these maximum edge length ranges to the distributions 
of deformation field edge lengths shown in figure 5.9 however, it becomes clear that the source 
grids we used downsampled the deformation fields by factors of about 1-10. We had to do this 
for technical reasons related to the memory footprint of the data sets*, but nonetheless we still 


achieved reasonable results, as we shall discuss shortly. 


*For each deformation mode model that we constructed, the memory footprint was over 1000DT doubles, 
where D is the number of dimensions of each canonical grid (D is 2I? for the square grids and 2sr for the 
quasi-circular grids) and T is the length of the deformation field sequence. If we had chosen | = 100 for the 
square grids, then we would have had to store more than 8 x 1000 x (2 x 100?) x 50B © 8GB. 


Seq. Grid Type Min. no. landmarks 
25% 50% 75% 95% 100% 


180608.HS ~~ Circular 10 7 5 3 1 
180608.HS Square 8 5 4 2 0 
230108.FO1 Circular 6 5 3 2 1 
230108.FO1 Square 4 3 | 1 0 


Table 5.3: This table shows the percentages of source grids that contained at least as many 
landmarks as are indicated in the table’s body. 
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Def. field edge length cumulative histogram: 180608.HS Def. field edge length cumulative histogram: 230108.FO1 Deformation field error cumulative histogram: 180608.HS Deformation field error cumulative histogram: 230108.FO1 
100} 100} 100} - 
® 80 3 80 ® ® 80 
Bin Bin i Bin 
FE 40 FE 40 & 8 40 
© 20 . © 20 : 2 : © 20 
% 2 4 6 "3 7 10 % 2 4 6 a 10 5 6 % 
Edge Length Edge Length Error Error 
(a) 180608.HS (b) 230108.FO1 (a) 180608.HS (b) 230108.FO1 
Figure 5.9: Cumulative distributions of deformation field edge Figure 5.10: Cumulative distributions of deformation field land- 
lengths (in pixel units). mark reconstruction errors (in pixel units). 
Seq. T N Area (p?) Dens. (100p)? Diam. (p) Spc. (p) Disp. (p) 
25 50 100 25 = ©50—~ 100 25 50 100 25 50 100 25 50 100 


180608.HS 51 29 49k 81k 115k 2.8 36 66 (203, 353) (275,394) (358,449) 49 68 312 6 11 35 
230108.FO1 59 30 81k 95k 109k 2.9 3.2 4.1 (335,401) (851,450) (368,491) 53 72 333 5 7 45 


Table 5.4: Landmark sequences: summary statistics. This table gives summary statistics for the manually-defined landmark sequences 
that the results in this section are based on, using the 25", 50° and 100'" percentiles of the indicated measurements, calculated over 
all frames. “J” denotes the number of frames that each sequence is defined over, “N” is the number of landmarks in each sequence, 
“Area” gives the areas of the convex hulls of the landmarks, “Dens.” is the mean number of landmarks per 10000 square pixels in each 
frame, “Diam.” refers to the differences between the most positive and most negative projections of the landmarks onto their principal 
components (smallest principal component first), “Spe.” gives the distances between connected landmarks in their Delaunay triangulations 
and “Disp.” gives the distances landmarks move in-between consecutive frames. The unit “p” stands for “pixel”. 
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Figure 5.12: Six deformation grids for video 230108.FO1. As before, the red pluses show the true landmark positions, and the blue crosses 


show the deformation fields’ estimates of their positions. 
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Our tests involved constructing deformation mode models and DTMs from the similarity trans- 
formations, using them to reconstruct the positions of the original landmarks for varying num- 
bers of modes and B-spline control points, measuring the errors in these reconstructions and 
using the errors to determine the numbers of modes and control points we should select for 
later applications of the models. The landmark reconstruction errors never reach zero, due 
to the errors in the deformation fields, but they tell us the minimum numbers of modes and 
control points that the models need to achieve tolerable degrees of error. They are more useful 
measures of model fit than the errors with which the models reconstruct regions of the deforma- 
tion fields, which do approach zero when the number of control points and the canonical grid 
resolution are high enough, but give no indication as to whether or not overfitting is occurring 
(i.e. whether or not the models are beginning to fit the deformation field errors rather than 


the true myocardial deformations that the fields approximate). 


We will now describe each test in more detail and give an analysis of the results. 


5.5.1 Deformation Mode Models 


For each sampled random phase offset 0, random similarity transformation f and set of deforma- 
tion field frame numbers T, = {t,,...,t, +7’ —1}, we used the following process to reconstruct 
the trajectory y;, of each landmark y that lay in the source grid in frame ¢, +0 using m of the 


k; principal components of each deformation mode model: 


1. Determine the quadrilateral abcd of f(g) that contains y, ,, and the bilinear interpola- 
tion coefficients a and @ which, when linearly combined with the quadrilateral’s vertices, 
give the landmark’s position exactly. We use the methods we described in section 5.3.1 


for this. 


2. Calculate a reconstruction gr, of the sequence of non-affine deformed canonical grids gr, 
by projecting them onto the model’s principal components, discarding the k—™m principal 
components with the smallest eigenvalues, and projecting them back into the canonical 


grid’s coordinate system (in Cartesian coordinates). 
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Figure 5.13: Landmark reconstruction error percentiles for two deformation mode models 
trained and tested on 230108.FO1. 


3. Use gr,, a, 8 and Uabed to calculate a reconstruction y’p, of yp,’s trajectory in the 


canonical grid’s coordinate system. 


4. Transform y’;, back into the coordinate system of the deformation fields. As the defor- 
mation mode model does not store affine transformation information, this has to be done 


with the inverted affine registration functions hz. 


We did this for each canonical grid representation over a set of 34 values of m, approximately 
quadratically distributed over [10,200] (quadratically distributed so that [10,200] was most 
densely sampled at the low end, where small changes in m make the greatest difference to 
each model’s accuracy). We constructed each model using the deformation data from just one 
video at a time, but we calculated the landmark reconstruction errors over both videos, for 
cross-validation. The first 10 modes of the models for 230108.FO1 are shown in figure 5.15; the 
modes for 180608.HS are similar. 


Figure 5.13 shows how the reconstruction error percentiles varied with m for two deformation 
mode models that we trained and tested on sequence 230108.FO1. We have the summarised 
the variations of the 50°", 75'*, 95», 98 and 100" error percentiles with m for all four pairs 


of training sequences and test sequences in figure 5.14. In that figure, the error bars denote 
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Figure 5.15: These figures show the mean and first 10 modes of the deformation mode models 
for all 5 grid types. The top row contains the means, and the remaining rows show the modes, 
with eigenvalues decreasing from top to bottom. To highlight the kinds of deformations that 
the modes represent, we have scaled them all by nine standard deviations. 
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the upper and lower bounds on the ranges of the percentiles, as functions of m. The numbers 
above them are the smallest values of m at which the percentiles fall to within 0.5 pixels 
of the minima. The crosses mark the values of the percentiles at these thresholds. Finally, 
to facilitate comparison with the deformation field landmark reconstruction errors, we have 


marked the same percentiles of those errors with orange horizontal lines. 


In all cases, values of m in the range [32, 49] were sufficient to bring 95% of the errors to within 
the range of errors of the deformation fields. In the 180608.HS video, for m > 25, the 50 and 
75" percentiles agreed with those of the deformation fields. Also, the five grid types perform 
very similarly up to the 98" percentile; the only significant difference between them seems to 
be that all of the circular grids produce much larger outliers at the 100 percentile. The cross- 
validation tests suggest that the models generalise well, in that up to the 98" percentile, the 


percentile ranges for each test sequence are almost identical across the two training sequences. 


5.5.2 Deformation Trajectory Mode Models 


Under the results of the previous section, we constructed all of our DTMs using 40-mode nested 
deformation mode models. Although some of the deformation mode models required up to 49 
modes for 95% of the errors to lie within the error range of the deformation fields, the 95'" error 


percentiles with 40 modes were still within the 7 pixel limit that we were willing to accept. 


Reusing the previous section’s notation, we used the following procedure to reconstruct the tra- 
jectory yz, of each landmark y in each source grid f (g) with M of each DTM’s kK modes, where 
T! =(t,,...,t, +T’ — 1], and 7” is the length of the source grid’s quasi-periodic subtrajectory, 


calculated as in section 5.4.3: 


1. As before, determine the quadrilateral Dabed of f(g) that contains y, ,, and the bilinear 


interpolation coefficients a and £. 


2. For each pair (h;, g;) of affine and non-affine transformations, calculate the 44-dimensional 


deformation vector 2, = T(h7~' 0 hi , +9) as described in section 5.4.3, where hj is the 
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non-translational part of h; and o’ minimises g;,4,:’8 projection onto the deformation 


mode model’s principal components. 


3. Calculate the trajectory vector p of the periodic B-spline that best fits the deformation 


vector sequence. 
4. Project p onto the DTM’s first WZ modes and reconstruct it to give a trajectory vector p. 


5. For each t € TY calculate a reconstruction x; of x, by evaluating the B-spline: x, = 


q( =; D). 


6. For each t € T/, reconstruct (hj oh? ,y, 9+) by evaluating (hi, G) = T*(#). 


7. Transform i into a transformation he ' that maps the canonical grid coordinate system 
to the deformation field coordinate system, by setting hy '’s non-translational component 


fone o hy—.,, and its translational component to that of hy. 


8. Use gr, a, 8 and Uabed to calculate a reconstruction yy, of landmark y’s trajectory in 


the canonical grid’s coordinate system by bilinear interpolation. 


9. Transform y7, back into the coordinate system of the deformation fields with the recon- 


structed inverse affine registration functions h7,. 
s 


We did this for all values of M € {30,40,...,80,90} using degree 3 B-splines with c control 
points, for all c € {13,18,...,33,38}, so that the B-splines had 10,15,...,30,35 degrees of 
freedom. As the two deformation sequences were so different, it would not have been possible 
to achieve good cross-validation results with such small values of M, so we only calculated the 


landmark reconstruction errors for the sequences that we trained the DTMs on. 


Figures 5.16 and 5.17 summarise the 50‘, 75%, 95%", 98t* and 100 error percentiles much as 
before, except that this time the error bars show the positions of each percentile. The 50‘ 
to 98" percentiles were very similar for all grid types in both test sequences. The greatest 
differences in 230108.FO1 occurred with c = 13, M = 30, where the 95'" and 98" percentiles 


of the “Circular, Cartesian, Constrained” and “Circular, Polar” grid types were almost 1 pixel 
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Figure 5.17: DTM landmark reconstruction error percentiles for 180608.HS under different numbers of B-spline control points (in pixel 
units). See text for description. 
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Figure 5.18: Eight samples from left to right of the mean trajectories and first nine modes of the 
square grid DTMs. To highlight the kinds of deformation sequence that the modes represent, 
we have scaled them by six standard deviations. 
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c M (Sq.) M (Circ., Cart.) c M (Sq.) M (Circ., Cart.) 
13 30 60 13 50 60 
18 30 40 18 40 50 
23 30 40 23 40 50 
28 30 30 28 40 50 
33 30 30 33 40 40 


38 30 30 38 40 40 


Table 5.5: Sampled values of M for which Table 5.6: Sampled values of M for which 
the 98'* percentiles of the “Square” and the 95'* percentiles of the “Square” and 


“Circular, Cartesian” landmark reconstruc- “Circular, Cartesian” landmark reconstruc- 
tion errors fell within the 7 pixel limit for tion errors fell within the 7 pixel limit for 
180608.HS. 230108.FO1. 


greater than those of the “Square” grid type. These minor differences diminished with increasing 
c and M. Similar behaviour occurred in 180608.HS where, for M = 30, the 98 percentiles 
for the “Circular, Polar, Unif. ang, scaling” grid type were 1-2 pixels greater than those of the 


“Square” grid type, but again these differences diminished with increasing M. 


The 50 and 75'* percentiles for all grid types were up to 2 pixels greater than those of the 
deformation fields in 230108.FO1, and at most a little over 1 pixel more in 180608.HS. The 
differences between the 95*" and 98" percentiles of the grids and the deformation fields were 
greater, but for sufficiently large values of M, the 95" percentiles fell within the acceptable 7 
pixel limit for 230108.FO1 and the 98" percentiles fell within that limit for 180608.HS. The 
sampled values of M for which this happened with the “Square” and “Circular, Cartesian” grid 
types (which, of all the quasi-circular grid types, had the lowest percentiles) are summarised in 
tables 5.6 and 5.5. From these tables and the graphs, it seems that the DTMs based on square 
canonical grids consistently lead to the simplest representations of the deformation sequences, 
since for each value of c, they minimised the sampled value of M for which the 95" percentile of 
the errors was acceptably low, and they also minimised the magnitudes of the 100‘ percentile 


outliers. 


Although the B-splines are only intermediate objects in the reconstruction process, it is nonethe- 
less preferable to minimise c, so as to avoid overfitting during the model construction process. 


So, using square grid DTMs with c = 23 and M = 50 for both sequences seems to be a rea- 
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sonable compromise between the conflicting desires of minimising c, M and the reconstruction 
errors. The 50 to 98*" percentiles only decreased by up to 0.5 pixels for the values of c > 23 
or M > 50 that we tested in 230108.FO1, and by up to 1 pixel in 180608.HS. Figure 5.18 shows 


samples from the first nine modes of these models. 


5.6 Conclusion 


In this chapter, we have discussed the methods that we use to construct statistical Deformation 
Trajectory Models (DTMs) of the sequences of 2D deformations that the myocardium under- 
goes. This allows us to sample empirically-justified periodic sequences of deformations that we 


can use to estimate the parameters of our particle filter’s likelihood functions. 


We begin the DTM construction process by fitting deformation fields to sequences of manually- 
placed landmarks, and using these to calculate a PCA model that we call a “deformation 
mode model”, which gives the principal non-affine deformation modes of a canonical subgrid 
randomly selected from somewhere within the image of the deformation fields. We carried out 
cross-validation tests on these models, and the results seemed to suggest that a single video may 
provide sufficient information with which to construct deformation mode models that perform 


well on new patients. 


The final stages of the DTM construction process involve concatenating the affine transfor- 
mation parameters to the deformation modes discovered by the deformation mode model to 
form “deformation vectors”, fitting periodic B-splines to sequences of deformation vectors that 
describe the deformation of a canonical grid over a cardiac cycle, and then using PCA again 
to calculate the principal modes of the B-spline control points. Our tests suggested that using 
square grids (with vertices specified in Cartesian coordinates) to construct DTMs seems to lead 
to the simplest representation of the sequences of deformations. It was not possible to carry out 
cross-validation tests with the DTMs, as it was clear that the sequences of affine deformations 
were very different in the two videos we used. So to construct a general-purpose DTM using 2D 


coordinates, we will have to combine deformation information from a large number of patients. 


Chapter 6 


Towards a Strategy for Coping with 


Particle Loss 


6.1 Synopsis 


In this chapter, we shall present our initial attempts at answering the three questions that we 
posed on page 115. Our answer to the first question, concerning the selection of the likelihood 
function parameters, makes use of the DTMs we developed in the previous chapter. More 
specifically, we sample sequences of plausible deformations from the DTMs, and use them to 
warp the regions surrounding the initial patch locations, so that we can calculate sample sets of 
the differences that we may expect to observe between matching patches/subregions as a result 
of myocardial deformations. We then use these sample sets to calculate maximum-likelihood 


estimates of the likelihood function parameters. 


In answer to the second question, about detecting when particles are lost, we employ a likelihood- 
ratio test that compares the likelihood of a patch’s appearance under the hypothesis that it is 
lost to the likelihood of its appearance under the hypothesis that it is not lost. We use the patch 
likelihood function L; to define the latter likelihood, and a background likelihood histogram to 
define the former. As with L;,’s parameters, we construct this histogram using DTM-based 


simulated deformations, except that this time we calculate differences between randomly se- 
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lected patches/subregions that do not match the reference patch/subregion. We select the test 
threshold by minimising the probability of a misclassification under L; and the background 


likelihood function. 


For the third question, on the restoration of lost particles, we employ a simple method based 
on sampling from the particles that have been labelled “not lost”. This method works well 
enough when only a few particles are lost, but cannot be used when all particles are labelled 
“lost”. We discuss some alternatives, such as using information from the first frame, and we 


describe some of their limitations. 


We then end the chapter with an evaluation of our particle loss test’s performance and an 
evaluation of the performance of our particle filtering algorithm under all of the methods that 


we have described in this thesis. 


6.2 Introduction 


The success of any particle filtering algorithm rests on the fidelity of its representation of 
the posterior distribution of the hidden states. As explained in [61] and [29], the variance of 
the particle weights tends to increase with time during sequential importance sampling, the 
eventual effect of which is usually that the weights degenerate to a state in which all but one 
of the particles have weights close to zero (one particle will have a large weight in this scenario 
as a result of normalisation, but it will not necessarily be located anywhere near a posterior 


mode). 


The most common way of dealing with this degeneration problem involves calculating the 
effective sample size (ESS), first proposed in [60] and defined with respect to the weights wi! 
of the n time t particles as 


(6.1) 


The ESS’s value varies between 1, when all but one of the weights are 0, and n, when all of the 


weights are 4. 
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Suppose we have two estimates fin, and fn, of the posterior expectation of an arbitrary 
function f(@ 1), the former being calculated by a Monte Carlo integration process in which m 
samples are drawn directly from the posterior, and the latter being calculated by drawing n sam- 
ples from the importance sampler ¢;. Kong explains in [60] that the expression n )>""_, (wl) is 
an approximation of the relative efficiency of tik and tiie i.e. if Frats has x times the variance 
of frm, the value of this expression should be close to 2. So given the well-known Monte Carlo 
integration theory result that the variance of fi,., is proportional to + ((47]), it follows that 


the ESS approximates the number m of samples for which Tee has the same variance as tite 


A typical solution to the degeneration problem is to resample the particle set whenever the ESS 
falls below a certain threshold, and then reset all of the particle weights to s, The resampling 
step involves sampling n times with replacement from the old particle set, selecting particle 
state el with probability wil each time. This may be an effective solution when n is large 
and the ESS decreases slowly with time. But if the ESS drops suddenly, the particle set may 


degenerate before the resampling step occurs, so that the particle set only has a small number 


of distinct states after resampling (it is said to be impoverished when this happens). 


A common event that causes the ESS to drop suddenly is a period of occlusion (due, for example, 
to specular highlights not detected by our mask Fo,, or due to Fo, masking out very large 
regions) during which the particles disperse. Such events are particularly problematic for our 
importance sampler ¢; when the time t— 1 image Z;_, is used to describe the appearance of the 
subregions within patch vio"), especially for small n and occlusion periods lasting for more 
than a couple of frames, as the particles are unlikely to remain close enough to the true state 
for the pixels within the patch boundaries they define to be a reliable source of information 
about the appearance of the patch at a later time when the occlusion has passed. In fact, this 
misdirection induced by the persistent use of the previous image is the principal mechanism 


through which errors accumulate and particle sets degenerate in our application. 


In addition, when using small n, it is crucial to maximise both the number of particles that are 
in high-posterior-probability states and the diversity of these particles, so that during periods of 


genuine ambiguity (such as when a formerly bent vessel momentarily straightens), the particle 
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filter may maintain a representation of as wide a range of probable patch state hypotheses 
as possible. Given this requirement and the misdirection problem of the previous paragraph, 
we propose an alternative solution to the problem of maintaining a faithful representation of 
the posterior in which, after the generation of each new particle set, we assess each particle 
individually to determine whether or not it is “lost” (i.e. in a state of unusually low probability), 


and take active steps to restore lost particles to high-probability states in the next frame. 


The task of identifying when a particle is lost is best suited to the particle filter component that 
evaluates the (dis)similarity of each sampled particle state to its expected appearance (i.e. the 
component that evaluates the A; function that we introduced in chapter 4). By using reliable 
information about what the patch and the region surrounding it should look like at time ft, this 
component can carry out statistical hypothesis tests to infer whether or not each particle is 


lost. 


In section 6.4, we will consider how these tests may be carried out under information derived 
from the first frame only by using the DTMs we developed in the previous chapter to simulate 
the appearance of the patch and its background in later frames. These simulations also provide 
us with a way of parameterising both the local likelihood functions that define the importance 
sampler and the patch likelihood functions L;. So we will cover their parameterisation in that 
section, thus providing answers to the first two questions that we posed on page 115. Of course, 
simulations based on the first frame alone can only provide an initial estimate of the appearance 
changes that the myocardium undergoes, so an interesting topic for future research is how we 
might go about improving the reliability of these tests by incorporating information from later 


frames, in which we are less certain about the true state of the patch. 


After discussing our particle loss tests, we will consider some methods for restoring lost particles, 
and then we will conclude with an analysis of the accuracy of our particle filter’s performance 


over the two video sequences that we have used throughout this thesis. 
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6.3 Detecting Particle Loss 


Before we can detect and restore lost particles, it is important for us to make the condition 
under which we consider them to be lost precise. Returning to the notation we introduced in 
sections 3.2.1 and 3.3, the theoretically ideal way of defining whether or not the i” particle is 
lost at time t involves a consideration of the marginal posterior probability density of the event 
O.= oe! | ol where ©, is the unobservable time t patch state, and eye is the observed state 
of particle i at time t. The marginal posterior distribution 7}; is defined with respect to the full 


posterior 7 by marginalising out the subtrajectory ©1.4—1: 
m7 (92) = [lr .xa)dBres ; (6.2) 


This distribution captures all of the information we have about the patch’s state at time t. So 
we consider the particle to be lost at time ¢ if and only if it lies in a region of this distribution 
corresponding to states that the patch has an unusually low marginal posterior probability of 


being in. The following definitions make this more precise: 


Definition 6.1. Let 9) be the region of ©;’s domain Q defined by the following properties: 


1. under m1, the event ©; € Q) occurs with some small, prespecified probability p; 
2. for all9ECQX—-—O, and # € 1, 7(0) > ai(0')*. 


Definition 6.2. A particle in state 6 at time t is lost if and only if 0 € QQ. 


As a simple example of these definitions, if 7; was a univariate Gaussian distribution and we 
set p = 1%, ! would be the union of the real numbers less than the 0.5‘ percentile and those 


greater than the 99.5" percentile. 


A good value of p to choose would be one such that if all of the time t particle states were truly 
sampled from 7, the expected number contained in 2) would be < 1. Le., for n particles, p 


should satisfy pn < 1. 


“Note that this property may not be satisfiable for some values of p if there are regions of 7; with non-zero 
Lebesgue measure over which 7} is uniform. The most extreme example of this is when z/, is uniform everywhere 
over 9, in which case this property is unsatisfiable for all p < 1. 
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6.3.1 Likelihood-Ratio-Based Particle Loss Inference 


Needless to say, it would be be impractical to attempt to carry out particle loss tests based on 
the definitions above, due, not only to the fact that the integrals that define z} (which includes 
the normalising constant of eq. (3.3)) are intractable to evaluate, but also to the difficulties we 
would face in trying to determine (2)’s boundary, even if we could evaluate z;. Recalling eq. 


(3.9), it may at first seem reasonable to approximate 7/(6,) using the particle weights wh: 


(04) = > 5,68" (6.3) 
i=1 


In cases where this is a good approximation of the value of 7} at the particle states, we could 
try sorting the particles in ascending weight order, accumulating their weights, and considering 
the first m of these sorted particles to lie in (; if the sum of their weights is < p. This method 
would be unreliable if a large number of particles were to drift into Q) at around the same time 
however, as might happen if the particles dispersed as a result of occlusions, so an alternative 


that assesses each particle individually would be preferable. 


By eq. (6.2), the expressions we gave for the full posterior and its terms on page 44, and the 


expressions we gave for the likelihood A;(@,) and its terms on page 51, 7}; satisfies 


(2)  Li(A; Oo, Zi J0,4) He() 


F,(:) = (01: 001-1, 2e00)Pa(Bx)m-1 B11) dO, 4-1 . (6.4) 


As we take the prior p;(6;) to be uniform, H;(0,) is proportional to the expected value of ¢;(©;) 


under the time t — 1 posterior 74_1, i.e. 
Hi(04) Oo Ber Hea Oe Oo:t-1, Ze 0:t) | Qo, Ze 0:1! . (6.5) 


Viewing [; as an expert that turns our Product-of-Experts model of the importance sampler 
into a Product-of-Experts model of the full likelihood function A;(6,), the role of ¢; should be 


to reinforce, and not undermine, the information that LD; provides. More specifically, for all 6; 


182 Chapter 6. Towards a Strategy for Coping with Particle Loss 


and 6;, L, and ¢; should ideally satisfy 


(Ox) > 7(O,) => L1(O1; Oo, Zi 0,4) 2 L1(6; Qo, Zt Jo,)) (6.6) 


In cases where the initial patch is highly distinguishable from its surroundings, the non- 
translational deformations between consecutive frames are not large, and the non-affine de- 
formations between time 0 and time ?’ are not large for each t’ € {1,...,t}, each importance 
sampling distribution ¢, (0; ©o.7—1, Ze.2") Should be in close agreement with Ly (9; Qo, Zt jo,4) 


on the probabilistic ordering of values of 6, because: 


a) the patch’s distinguishability implies that Ly is a unimodal function of 6); 


b) for each pair (Sy_1, Sy) of square subregions such that S,_;’s centre lies within the time 
t’—1 patch V(©y_,) and Sy’s centre lies at the corresponding time t’ position, the regions 
of images Zy_; and Zy» bounded by Sy_; and Sy should look similar to each other, since 


the non-translational deformations between consecutive frames are small; 


c) some Sys should look different to all other time t’ subregions that are involved in the 


definition of Cy (@v; Oo0—1, Zeo.1), which again follows from the patch’s distinguishability; 


d) each time t’ patch V(©,) should look similar to an affine transformation of the initial 


patch state V(Q9), since the non-affine deformations are not large. 


If the subregion grid centres are sufficiently densely distributed, the distinguishability of the 
patch together with our results from chapter 3 on exponential likelihood ratio growth suggests 
that the importance sampling distributions @,, should be close to delta functions centred at the 
true patch states. In turn, this, together with the expected agreement between Ly and ¢y/, 
suggests that the time t — 1 posterior 7,_; should be close to a delta function centred at the 
true patch subtrajectory Qo9.4_1 (since, under a uniform prior, the posterior is proportional to 
the product of the patch likelihoods and the importance sampling distributions). Hence by eq. 
(6.5), it follows that 

Fi, (0) © kE:(A; 0-1, Zeon) (6.7) 
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for some constant k, which implies 
(Oz) & k’ Le(O4; Oo, Ze Jo.) (O15 Qo:e-1, Ze ot) ; (6.8) 


for some constant k’. So under the expected agreement between ¢; and L; in this case, it is 
reasonable to assume that eq. (6.6) holds for most 6, and 6;. The maximum magnitude of non- 
translational deformations between consecutive frames under which this conjectured behaviour 
holds could potentially be increased by using a subregion dissimilarity measure that has strong 
invariance to rotations and non-rigid deformations, such as the Earth Mover’s Distance between 
pixel histograms (as opposed to our current calculation of the EMD between histograms of 


corresponding pixel differences), but we shall explore this in future work. 


If we assume that eq. (6.6) holds for all 6; and 6), then it follows by definition of 9) that 


VO, Ot (« EM, O¢ EN > Li(1; Oo, Zio.) S Li; Oo, 21,04) 


ov O, @ ¢ os > Li (04; Oo, ZL Jo,1)) > 7) ; (6.9) 
for some threshold 7; > 0. From this, it follows that there is a threshold 7 > 0 that satisfies 
AroulV (Oo), Vie")] >n + OM em , (6.10) 


where A; oz is the image dissimilarity function that we defined in chapter 4. The falsehood of 
the antecedent here is not generally a sufficient condition for us to conclude the converse, that 
o!! € Q—()), and it would not be easy to determine the smallest value of 7; for which this test 
; 


is true for all ©;’, but this idea forms the basis for the loss test that we have developed. 


Let 


whiselen (6.11) 


be a boolean random variable that takes value T when the i" particle is lost at time t, and 
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value | otherwise. Also, for notational brevity let 
Al, 2 ArodV (Go), V(01)] (6.12) 


be a random variable that is completely determined given Zjo4 and oie i: We tackle the 
particle loss detection problem in a hypothesis testing framework, in which the aim is to use 


our observation of Nee to compare the likelihood of the null hypothesis 
Hy: W! =T (6.13) 


to that of the alternative 


M:Wileal . (6.14) 


We test the null hypothesis using a variant of the classic likelihood ratio test. As described in 
[17], the likelihood ratio test is usually used to compare the likelihood of a null hypothesis Hj 
that a parameter x belongs to some set X’ to the likelihood of an alternative hypothesis H; 
that x belongs to a set X that contains X’. Given a set of data samples D, the test rejects H) 


when 
sup,cx P(D|x = y) 


oo cae 6.15 
SUDycx P.Dig= a) ( ) 


for some constant c € [0, 1] used to control the probabilities of type I and type IJ errors. In our 
case, Ho and Hy, are disjoint hypotheses, so we allow c to be any value > 0, and we reject Ho 


if the test i 
BaslAty) < 


Teas (6.16) 
Fo,(Al) 


is true, where Bgy and Fg, are functions that we refer to as the background likelihood and the 


foreground likelihood respectively. We define them as 


Ba (At!) = PAS | Ww"! =T, Zi0,t)> Qo) 


Fo,(A") & P(AM, | WE = 1,Zi04,©o) - (6.17) 


When the test is false, we conclude that there is insufficient evidence in favour of H,, and so 
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we assume Ho. 


In the next section, we will discuss our formulation of the background and foreground likelihood 
functions, and the methods by which we estimate their parameters, the parameters of the 
importance sampling distribution @; and the patch likelihood function L,, and the optimal 


value of the test threshold c. 


6.4 Parameterising the Particle Loss Test 


6.4.1 DTM-Simulation-Based Sample Set Creation 


All four of the likelihood functions we have discussed so far — the background likelihood Bg, 
foreground likelihood F¢,, patch likelihood L; and local likelihood ¢; — are defined as functions 
of differences between image regions measured by the weighted Euclidean dissimilarity function 
€ that we defined in chapter 4. When we defined L; in terms of the stretched exponential 
function y, we implicitly assumed it to be both spatially and temporally stationary in the 


sense that for any times t and t’ and patch states @ and 6’, the random variable 
A; = Aro+[V (90), V(9)] (6.18) 


(that is random due to its dependence on Z,,) follows the same distribution as the random 
variable 


Av = Azov[V (Go), V(6)] (6.19) 


(that is random due to its dependence on Z, »). This follows from the fact that for all 6 


P(At = 6 | Zi.9, Oo, Ot = 8) = L1(Oz = ; Oo, ZL 9, Ar = 5) (by definition) 
= p(o; ar, Iz) 
cae Ly (Ov ct 0; Oo, Zi, Av = ) 


= P(Ay = 6 | Zo, Oo, Oy = 6’) 5 (6.20) 
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where, with slight abuse of notation, we pass A; and A, to L; and Ly to signify the amounts 
by which the regions of images Z_, and Z_, within V(@) and V(6’) differ from the region of 
Zo within V(Q9). The spatiotemporal stationarity of ¢; follows by a similar argument, since 
under our model, the hyperparameters ay and yy that define the distribution of the subregion 
difference random variable Ay do not vary with time or with the states of the patches that are 
being compared. We extend these stationarity assumptions to the background and foreground 


likelihoods, and to emphasise this we will drop their t subscripts from now on. 


We divide the process of parameterising Bc, Fc, L; and ¢; into the following two steps: 


1. Construct sets Dg, Dr, Dr and Dy of image differences that follow the distribution that 


we want each likelihood function to represent. 


2. Identify the hyperparameters of the likelihood functions that maximise the likelihoods of 


the D, sample sets. 


We shall address the construction of the D, sets in this section, and deal with the maximum- 


likelihood estimation of the likelihood function hyperparameters in section 6.4.3. 


For all t, let v;‘(p) be a deformation field that perfectly maps each pixel p in image Zp to the 
corresponding pixel in Z;, and define ©*, such that M(-; Oo, ©*,)* is the best affine approxi- 
mation of the transformation that v; induces on each point in the initial patch V(Qo). By the 


assumed stationarity of [;, we could then define D; as some subset of 


{ArolV (Go), V(O%,)] :t > 0} . (6.21) 


As v* is unknown however, we create an approximation of this set using the DTM. To do this, we 
sample n,;, deformation trajectories from the DTM, sample each one at 77, +1 uniformly-spaced 
phases, and use the resulting deformations vz1-n, 0:7,, to warp the initial lightness-normalised 


image and occlusion mask. For each deformation trajectory q‘), we would ideally then calculate 


*M is the affine coordinate transformation that we defined on page 53. 
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the Ty, best-fitting affine transformation parameters ©’ es and then add the set 
{ArinnoglV (0), VOM) :1<t<T} (6.22) 


to D;,, where A ee is the result of applying A; 9, to the time 0 and ¢ warped images 


0,¢] 
produced by the k* vz, (we will use similar notation later with other deformation field sequences 
sampled from the DTM). However, we found that this process tended to underestimate the 
standard deviation of Dy, leading to numerically unstable underestimates of the yz, inverse 
decay rate parameter of L;. The main reasons for this seem to be that the ©’ ue fit the 
deformations generated by the DTM too well, and that we have not modelled all of the non- 
deformational appearance changes that occur on the myocardium, such as changes due to noise 
and changes of colour/texture that occur as the surface stretches and contracts. Some of the 
colour/texture changes look like the result of surface features fading, giving way to the prevailing 
colour of their surroundings, or colours in one region flowing into another (figure 6.1 gives some 


examples of these changes). We compensate for the DTM’s inability to produce these effects 


by generating mz, small, zero-mean Gaussian-distributed perturbations sO mt) and setting 
Dee eee [V (Go), Vio’ + 60%) -1<k<ngl<j<mpl<t< T,} , (6.23) 


where oe) ~~ sot) represents the result of perturbing @'*) by sO), We will make all of 
this more precise in the next section. 

The variations in At that Fa describes are the result of els variations over 2 — 2), so 
once again, we measure patch differences with respect to nr simulated sequences of Tp + 1 
deformations Vpin,0:T-- Due to the difficulty of determining /’s boundary after choosing 
a probability p that defines it, we instead assume that for some unknown p, 2 — 1) is well 


approximated by a neighbourhood Ni of ©’ a Clearly, increasing Nis size will have the 


effect of decreasing the p that corresponds to the Q — (} that Ni”) best fits. 


The ideal way to generate Dp’s samples would be the rejection sampling approach of running 


the particle filter repeatedly and adding Azv,.,.0.,|V (Qo), vol") to Dr whenever 0!!! € Ni. 
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(a) (b) (c) (d) 


Figure 6.1: The encircled regions of (a) and (b) are an example of the kind of appearance 
change in which the colour of a feature (the near-vertical vessel section in (a)) fades away. (c) 
and (d) show some other changes of colour/texture that the DTM cannot generate. In some 
regions, colours seem to flow into each other, but in others new colours seem to appear from 
nowhere. Constructing D; by perturbing ©’ " can account for the former of these occurrences 
to an extent, but it cannot account for the latter. 

All four images were taken from the 230108.FO1 sequence, after lightness normalisation. (a) 
and (c) are from frame 11 of the sequence, and (b) and (d) are from frame 21. 


But this would be impractical, especially considering the fact that the probability of the particles 
falling in each Ni” would generally decrease as t increases. So instead, we use rejection sampling 
to draw mp patch state samples from the region of a ©’ ) centred Gaussian distribution that 
lies within NO adding the corresponding A; values to Dr. Note that when the patch is very 
easy to distinguish from its surroundings, the neighbourhoods that correspond to reasonably 
small values of p should all be small, and so we can set Dr = Dy. 

Similarly to Fg, Be describes the variations in AN, that result from oe! ’s variations over (}. 
We have no prior reason to assume that any patch state in 9/ is more likely than another, so we 
define Dg by drawing mg uniformly-distributed state samples from each of the 7’, frames of 
np simulated image sequences, using rejection sampling to ensure that the samples lie outside 


Ni” and within the range of transformations that the importance sampler can generate for a 


particle in state ©’ we at each time t — 1 (we defined this range in section 3.3.3). 


The local likelihood functions ¢;, that multiplicatively combine to define the importance sam- 
pler ¢;, represent the variations in the differences Ay, between the image-boundary-aligned 
subregions S(g,_,) of V(Q;_1) and the corresponding image-boundary-aligned subregions S(g) 
of V(©;), where each subregion centre g,_, is a grid vertex in the set G;_, of vertices within 


V(0,_1), calculated as described in section 3.3.1, and g = M(g,_,; ©1-1, Oz). So again, we use 
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ne sequences of Ty images Z'j-n,0:r to generate samples for Dy, constructing the set as follows: 


Du = {Aewunerg [S(9'O),5 (M (9; OM, Of + 50!) J] : 


1<kene, PEGS, 1<t<Ty}, (6.24) 


where Gi is the set of grid vertices within v(e"").), Aeuy, gute 1S Av, applied to images 
warped by the simulated deformations vy, 41,4, and so”) is a Gaussian perturbation that we 


use once again to account for colour changes. 


6.4.2 Simulation Details 


We begin the generation of a simulated image sequence by sampling a trajectory vector p from 
a B-Spline-Based Gaussian DTM. To generate p using the M largest modes of the DTM, we 
generate an M-vector € of independent standard Gaussian random variables using the Box- 


Muller transform ([15]), and set p to 
p=P+ B.x.wdiag(d?,...,A2,e , (6.25) 


where p is the model’s mean trajectory vector, ); is its i*" largest eigenvalue, and E.y.,y is the 


matrix of M corresponding eigenvectors. 


Before we can use the deformation trajectory q(-;p) defined by p to deform the initial images, 
we must recalculate/resample the deformation sequence properties that were removed during 
construction of the model, namely the period T of the cardiac cycle (in frames), the phase o at 


which to begin the sequence, and a 2D similarity transformation 


f(a; p,@) = pR(b)a + d(Qo) (6.26) 


that maps the sequence of deformed grids defined by q from the canonical grid’s coordinate 
system to that of the patch’s initial state V(Q9), where R(¢) is an anticlockwise rotation by 


@ radians, d(@o) is the initial position of the patch, and p is a scaling factor. The rotation 
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parameter is arbitrary, so we uniformly sample it over {0, 27). 


Given these properties, an image Y; can be generated from an initial image Yo using standard 


graphics library texturing functions. In particular, letting 


(ie * (« (c ~ — p)) (6.27) 


denote the time ¢ inverse affine transformation h’;' and non-affine deformed canonical grid 
g’, (as calculated by the mapping Y* that we defined in section 5.4.3), we generate texture 
coordinates that map the vertices of the initial grid f(h’)'(g’9))* to the corresponding pixels of 
Yo, and then we generate the time t image Y; by rendering the quadrilaterals of f(h';‘(g’,)), 
assigning the initial texture coordinates to the corresponding vertices of this new grid, so that 
the motion of the vertices warps the initial image. We set each occlusion mask Fo, to 1 
everywhere outside grid f(h’;'(g’,)), indicating that the pixels outside the grids are undefined. 


Figure 6.2 shows an example of this process rendered with OpenGL. 


Our tests at the end of this chapter were all performed offline, so we defined T manually. But 
for general use, it could be determined by a number of methods. The simplest method would 
of course be to obtain the information from an ECG. But it may be preferable to measure it 
directly from the video in some instances, such as when an ECG is not available. A simple 
method that we have found to be reasonably effective in the past when dealing with videos 
that contain no camera motion is to calculate the autocorrelation of the pixels within a fixed 
region of the images, and take T to be the period between the peaks in the autocorrelation 
signal. A more sophisticated and robust method, that can handle camera motion, intrinsic 
camera parameter changes and affine respiratory motion, but requires a set of landmarks to 
have already been tracked, was proposed in [50]. It uses the epipolar constraint to identify time 
points at which the landmark sets are equal up to a projective transformation of their original 
unknown 3D positions, and hypothesises that these time points must occur at the same phase 


of each cycle. 


*We use f(h’; ‘(g’,)) to indicate the application of f to every point in h’;'(g’,). 
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(g) Frame 36 (h) Frame 42 (i) Frame 48 


Figure 6.2: Some frames from a 50 frame deformation sequence sampled from the DTM that we 
constructed for the 180608.HS sequence and applied to the region of the first image surrounding 
the initial feature state shown in (a). For reference, we have rendered a regular grid with 24- 
pixel-wide squares over each image. 
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Calculating the Starting Phase 


As the scaling components of the similarity transformations used in the construction of the 
deformation mode model defined the amounts by which the canonical grid g should be scaled, 
we must choose a starting phase o at which the shape of the grid defined by the deformation 
trajectory is as close as possible to g’s shape, so as to ensure that the range over which we sample 
pis comparable to the scaling factor range used in the deformation mode model. The easiest way 
to identify such a starting phase is to search for one of the deformation trajectory’s candidate 
phase offsets, which we defined in section 5.4.3 as the minimisers of x(o’) & ||q(o';p)||’.. We 
can efficiently estimate a candidate phase offset by Newton-Raphson minimisation, using the 


fact that the first two derivatives «™ and x) of « satisfy 


av (0!) = 2q(o'; p)'g™ (0; p) 
= 26(0)" [pl [p]b (0! 


x) (o!) = 26 (0!) fp]? [p]b (0!) + 20(0')"[p]" [p]b () (6.28) 


[81] gives the following useful formula for the first derivative of the degree d B-spline blending 


function baj(o’): 


d d 


b\) (0!) = eee C. ee ere ie (6.29) 
; lita — by tigati — tiga 
from which it follows that 
d d 
6.) (0) = (0 eC) 5 (6.30) 


d—1,i 
bisa — bi tisati — bia 


where the ¢; are the B-spline knots. Given an initial estimate 05, the Newton-Raphson algorithm 


calculates an improvement o/, of the (n — 1)‘ estimate o/,_, as 


SL 
I 
SL 
gi 
oO 
i 
uF 


/ 
ea (6.31) 
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In cases where z is locally well-approximated by a quadratic function around o/,_,, the algorithm 


generally converges rapidly. But in other cases, the step 6,-; may be too large. So at each 
iteration we set o/, to an approximate local minimiser o/,_, + 6), of x, where 6/1 < |dn-i]. 
We search for 6/,_, using a simple dichotomous line search method (described, for example, in 
[8]), which is a form of binary search in which one half of the search space (which is initially 
set to 6/_,’s range) is discarded in each iteration if x’s value at a point in that half near the 
midpoint is greater than its value near the midpoint in the other half. More sophisticated and 
efficient line search methods could have been used (see [73] for a comprehensive review), but we 
achieved very fast results with the dichotomous line search, and it was simpler to implement 


than the alternatives. 


We define multiple initial values for the search by selecting d — 1 values uniformly-distributed 
over each knot span [t;,t;41). This is the minimum number that should be used over each span, 
since the degree d polynomial piece that each span corresponds to may have up to d—1 turning 
points in each dimension. From the results we obtained, it did not seem to be necessary to 


sample each knot span more densely than this. 


We use relative measures of convergence to determine when to terminate the both the line search 
and the Newton-Raphson minimisation. For the line search, suppose the search space has been 
reduced to an interval [o,o'] after the k** binary subdivision. We calculate the maximum 
difference d, between the values of x at o, otto and o’t, and we consider the search to 
have converged when d; is less than the product of some threshold 7 and the mean value of 


d,:..-1. For the Newton-Raphson minimisation, we consider convergence to have occurred when 


/ 


') — x(o,_,)| is less than the product of some threshold r’ and |zx(o/,)|.. In both cases, 


|z(o 
upper bounds must be placed on the number of iterations that can be carried out, to prevent 
infinite loops. We found 30 iterations to be sufficient with 7 = 7’ = 10~°, and in many cases 


convergence occurred in far fewer iterations. 
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Sampling the Scaling Factor 


Given a candidate phase offset 0, let a, be the area of grid q(o;p) in pixels? (calculated by 
adding up the areas of all of the grid’s quadrilaterals), and let [at, a] be the range of areas 
(also in pixels”) that corresponds to the range areas of the similarity-transformed canonical 
grids that were used in the construction of the deformation mode model. When the DTM is 
being used simulate deformations of images from the video that it was constructed from (as in 
the tests at the end of this chapter), these two ranges of areas are equal. But in the more general 
case, in which a DTM is constructed from multiple patient videos before being applied to a 
new video, the former range of areas would have to be determined from some depth-invariant 
measure of the areas of regions within the training videos and the new video. One possible 
way to define such a measure would be to consider the size of myocardial features that can be 
recognised across patients. Better yet, if optical flow fields could be reliably calculated between 
consecutive frames, the ranges of displacements across regions of the flow fields (the range over 
a region may be a useful indicator of area since the range of displacements over a region can be 
no less than the range over its subregions) could be exploited, calculating the ranges relative 
to the sizes of the regions in pixels, to obtain measures of area that have a degree of invariance 


to the camera’s distance from the myocardium. 


The squared scaling factor p? should be uniformly sampled over a range defined such that the 
area of the initial grid fs (ey) always lies within [a‘,a'], which implies that p?’s range 
should be a subset of [<, at), In addition to this, we would also ideally like grid f(h’;'(g’,)) to 
contain as much of the area occupied by the patches used in the construction of the D, set as 
possible, for each time step t. However, it would not be easy to identify the smallest value of p? 
for which each f(h’7'(g’,)) at least contains a pre-specified proportion of the area of each patch, 
due to the non-analytical relationship between the value of p and the state of the best-fitting 
affine transformation ©’;. So instead, we use the simple heuristic of ensuring that p is large 
enough for the initial grid to contain as many time t = 1 patches used in the construction of D, 
as possible, subject to the first constraint on p* above being satisfied. This involves identifying 


the shortest distance dt from the origin to a vertex of q(o;p), and an upper bound mt on the 
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distances between the initial feature position d(@9) and the possible positions of the vertices 


of the time t = 1 patches, so that we may define p”’s range as 
at at fmt\? at 
min | — ,max [| —, (=) ie (6.32) 
Ao Ge’ \ ds Ao 


For Dr (= Dz), we specify each neighbourhood NY as a set of bounds on the rotational, 
scaling, skewing and translational components that we will ultimately combine as in eq. (3.1) 
on page 42 to define the perturbed patch states. So given values for these bounds, we scale the 
initial patch V(Q,) by the maximum amount permitted by nN and try all four combinations 
of positively /negatively skewing it by the maximum amount in the x/y directions, and then 
we define m* as the sum of the maximum distance from d(@9) to one of V(Qp)’s transformed 
vertices, and the maximum distance by which N*) allows the patch to be displaced. We define 


m! similarly forDy, using a similar definition of the range of Gaussian perturbations, but we 


add the radius tet 1 of the square subregions used in the definition of ¢, as an extra term. Our 
definition of m* for Dg is also similar to our definition of it for Dr, except that this time the 
bounds on the magnitudes of the affine motion components are given by the d', d°, «1, Kh and 


s! parameters that we used to define the domains of the importance sampling distributions in 


section 3.3.3. 


Final Steps 


After calculating/sampling the parameters of the similarity transformation f, the final steps 
are to calculate the affine transformations ©’; that best fit the deformation grids f(h’;'(g’,)), 
and to randomly perturb them. We estimate each ©’; using a 10 x 10 grid of points P regu- 
larly distributed over the initial patch V(Qo). We deform P with f(h';'(g’,)) (using bilinear 
interpolation, as described in section 5.2) to form a point set P’, and calculate the affine trans- 
formation f that best maps P to P’ using the Moore-Penrose pseudoinverse, as described in 
section 5.3.2. Finally, we define ©’, in terms of the translational component of f and the RSK 


decomposition of its non-translational component. 
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To calculate a random perturbation 0’ + 60, we first of all draw samples dd, 6¢, (Okx, OKy) and 
(In(ds,), In(dsy)) from zero-mean uniform or Gaussian distributions, as appropriate (when using 
Gaussian distributions, we define the standard deviations to be a third of the sampling range), 
and we also sample a uniform binary random variable b € {0,1} that we use to decide whether 
to skew in the x or y direction. We then update ©’ by adding dd, 6¢ and (bdk,, (1 — b)dxy) to 
its displacement, rotation and skewing components, and replacing its scaling components with 


(05,0' 12), 0530 tai): 


6.4.3 Stretched Exponential Likelihood Parameterisation 


The regions surrounding the initial patches, over which we calculate Dg, vary greatly, leaving 
us with no prior expectations about the form of the distribution of Dg’s values. So we construct 


the background likelihoods empirically, using a Gaussian kernel over Dg to estimate Ba: 


Be(5)= S> N(6- 60,0) , (6.33) 
i/EDB 

for some standard deviation o. For efficiency, we use a discretisation of Bg that we calculate by 
convolving the Gaussian kernel with a histogram defined over Dg. We manually select a value 
for o that removes local density fluctuations from the histogram while preserving its overall 
shape. Although this gave adequate results, it may be possible to derive better estimates of 
B@’s values in low density areas using a variable bandwidth density estimator, such as one of 
the estimators described in [109]. We have not taken this more complex approach however, as 
we feel that the most pertinent way to improve the accuracy of our hypothesis tests would be 
to relax some of the assumptions/restrictions we have made in defining Bg and Fg, e.g. by 


using images other than the initial image as references. 


While we make no prior assumptions about the form of the distribution over Dp, we do have 
prior expectations about the forms of the distributions over Dr/Dz, and Dy, so it is preferable 
to fit a parametric model that enforces these expectations to the distributions over these sets. 


Such models have added benefits of typically only having a small number of parameters to 
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Figure 6.3: Stretched exponential distributions for various values of a and ¥. 


estimate, and alleviating the difficulties we would otherwise face in trying to estimate the 


probabilities of rare events. 


The main prior assumption we wish to impose upon the distributions over Dr/D; and Dy is 
that the likelihood of an image patch difference A is a continuous, strictly-decreasing function 
of A. As mentioned in section 3.3.1, the distribution with these properties that we choose to use 
for the importance sampler ¢, and patch likelihood function L; is the two-parameter stretched 


exponential distribution 


A\® 
y(A; a) 7 exp - (=) \ » a> 0, fe 0 ’ (6.34) 


which includes the exponential distribution and the single-tailed Gaussian distribution as special 
cases (when the stretching exponent a is set to 1 and 2, respectively. See figure 6.3 for examples 
of the distribution under various parameterisations). We assume that the foreground likelihood 
F@ is of the same form, since we define Dr = D;, and so we only have to estimate two sets 
of hyperparameters. Whilst we were content to use the distribution in unnormalised form 
earlier, the normalising constant is vital to our present discussion, as our aim is to estimate the 


parameters by the maximum-likelihood method. 


The stretched exponential distribution is a single-tailed special case of the generalised normal 
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distribution described in [71]. From the results published in that article, it follows that the 
normalised stretched exponential PDF £(A;a,7) is 
LN: 
o(Ara,y) £ ARO) (6.35) 
ei 


and the CDF (A; a,7¥) is 


vo pa hr (h(8)) 
&(A;a,7) = 7 (6.36) 
where I'(s, 2) is the upper incomplete Gamma function: 
Fs) = / tote dts; (6.37) 


and 


Ps =L(s;0) (6.38) 
is the complete Gamma function. 


Before calculating maximum likelihood estimates of the stretched exponential parameters, it 
is prudent to inspect the empirical distributions of the image patch difference sets D, to see 
how close they are to stretched exponential distributions. Figure 6.4 shows typical examples of 
the empirical distributions. The most striking thing about these distributions is the fact that 
they do not peak at zero. This would almost certainly always happen regardless of whether 
the histograms were calculated under appearance changes caused by simulated deformations, 
or appearance changes caused by the true deformations, since the fact that the myocardium is 
in perpetual motion makes it very unlikely for any patch to have the exact same appearance 


(or extremely similar appearances) in many frames. 


It is of course highly unlikely that any stretched exponential random variable could generate 
distributions like these that peak so far from zero, but an alternative model of Fo, L; and 
(, that gives greater likelihood to large patch differences than to small differences would be 


objectionable from a practical point of view, since it would bring about the absurd phenomenon 
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Figure 6.4: These figures show histograms for the D; and Dy sample sets generated for some 
patches from the 230108.FO1 sequence and the 180608.HS sequence. The rightmost number 
underneath each figure is an identifier for the patches — figure 6.8 shows what these patches 
looked like. The stretched exponential distributions that maximise the conditional likelihood 
of each sample set are shown as (truncated) solid red lines. 
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of the particle filter shying away from rarely occurring states in which the patch looks very 
similar to its initial state! So for each sample set D,, we calculate the stretched exponential 
hyperparameters a, and y, that maximise the conditional likelihood of the elements of D, that 
lie in the upper tail of its histogram, under the assumption that the rate of decay of the upper 
tail is a reasonable indicator of how fast the stretched exponential should tend to zero. Noting 


that the elements of D, are always < 1, by definition of €, we want to find 


(a.,7%) =argmax |[[ L(6;a,) , (6.39) 


Ma DinD: Fee ais 


for some lower bound 6‘ on the difference samples, where 


ecole ee 4 


To simplify matters and make the problem amenable to being solved by standard minimisation 


algorithms, we take the usual approach of minimising the negative log-likelihood 


(Qx, Y«) = arg min f(a, 7) 
a>0,y>0 


(6.41) 


where 


fiay4-m [][ L6;a,7) 


5€D, 525+ 


-- 5 {ne (2) -iny—mata,sy} 


5€Dx, 525+ 


nassee(EY)-A(EQ)) ws 
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(care must be taken when evaluating Inh, as h may be rounded down to 0 when 7¥ is very 


small). We turn this into an unconstrained optimisation problem by defining 


Ae? 5 (6.43) 


and minimising f(e%, e’) with respect to a and g. We found MATLAB’s implementation of the 
trust-region-based method described in [19, 20] to be an effective solution to the problem. This 
method works by iteratively minimising quadratic approximations of the objective function f 
over a neighbourhood of the current solution. The quadratic approximations are calculated 
from the gradients of f and from approximations of the Hessian, so we need to define the first 


derivatives of f with respect to a and g. 


By repeated application of the chain rule, the derivatives can be expressed as follows: 


Af(a,7) _ Af (0,7) 


Oa Oa 

OF) _ gf) 
Og Oy 

ey es i3-G] no — 5 ! = 
= 5eD,, doo Y 7 (a, 7) a 


5EDx, 525+ 
an(asr) 1 (C&G) )) _ re (&)) 
Oa a? Ox bt Ox i 
| (5) motte - (2) "nt Phew 
e) y Oy y(t \7 pe Oy y=-(2)" 


7 


2) ey) 
OY y ye fee 


Oh(a,7) _ a (=) haw) 
y y 


(2 


i 
2 ee ee (6.44) 


Geddes et al. give a closed-form solution for ew) in [35] in terms of a very general special 


function known as the Meijer G function, which, under various parameterisations, reduces to 


202 Chapter 6. Towards a Strategy for Coping with Particle Loss 


many more well-known special functions. Unfortunately, this function does not currently seem 
to be widely supported by numerical libraries, so we estimate the derivative by the central 


difference method instead: 


OP(z,y)  T(x+éz,y) —P(e — 62,y) 
dc 26x eae) 


for some small dz. We use a simple expression for 6x given in [82] that approximately minimises 
the errors that result from truncation of the higher-order Taylor terms in the central difference 


approximation and the errors due to floating-point roundoff: 
bx = 3x , (6.46) 


where € is the machine epsilon value — the smallest positive number for which 1+ ¢ 4 1 in 


floating-point arithmetic. 


To determine the lower bound 6‘, we convolve D,’s histogram with a Gaussian kernel, and then 
set 5‘ to the centre of the bin that has the greatest mass. We then initialise the minimiser by 
calculating f at values of a and y distributed over a uniform grid, and setting the initial state 


to the grid point at which f is minimised. 


6.4.4 Selecting the Test Threshold 


The final parameter that we must define before we can carry out the loss hypothesis test is 
the comparison threshold c. Under the assumption that our parameterisations of Bg and F¢ 
give reasonable approximations of PAY | wi = T,Zjo,4q, 90) and P(Af, | wi = 1,Zjo,4, 9) 
respectively, we choose a value of c that minimises the probability of type I and type II test 


errors. 


For any given choice of c, the probability of a type I error occurring, in which the test incorrectly 
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concludes that a lost particle is not lost, is given by integrating Bg over the set 
Er(c) = {6 20: Be(d) < cFe(d)} , (6.47) 


and the probability of a type II error, in which the test incorrectly concludes that a particle 


that is not lost is lost, is given by integrating Fg over 
Eqr(c) = {6 2 0: Ba(d) > cFe(5)} - (6.48) 
We define c as the minimiser of 
fle) 4 ‘ Bo(6) dé + i Fal)-as (6.49) 
Ex(c’) Eqs (c’) 


which is proportional to the probability of either a type I error occurring or a type II error 
occurring, under the assumption that the prior probability of a particle being lost is equal to 


the prior probability of it not being lost, i.e. assuming 
P(W;! = T | Zio, @0) = P(r! = 1 | Zjog, ®o) - (6.50) 


This is probably not an accurate prior assumption, but it is the most noncommittal assumption 


we can make without further analysis. 


Suppose we have defined Bg over N bins with boundaries [b', b} + w),..., [b' + (N — 2)w, b! + 
(N—1)w), [b'+(N—-1)w, bt = b'+Nu], for some bin width w. Noting that Bg(6) = 0 < cFg¢(6) 
for all 6 ¢ [b‘,b"] and c > 0, it follows that we only need to integrate the first term of f over 


6 € [b', b"). So we carry out the integration by adding up the integrals over each bin of Bg. 


As we model Fg as a strictly-decreasing function and Bg as a histogram, which is a piecewise- 
uniform function, it is simple to calculate the integral over each bin. For any bin 6, there are 


three possible cases to consider, as illustrated in figure 6.5. The cases are: 
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Case 1 Case 2 Case 3 


Figure 6.5: The three possible states of Bg(b) relative to cFg(b), for some bin b and some value 
of c. 


Case 1: 

Case 2: 
Vo Eb Bald) > cF (0) ; (6.52) 

Case 3: 
dr €bV6 €b(6 <a > Be(d) < cFge(d) A 6 >2- Bel(d) > cFe(d)) . (6.53) 


Let bo and b; be the lower and upper bounds of b. 'To determine whether or not the first case 
holds, it suffices to check whether or not Be(b1) < Fe(b1). In this case, b C E7;(c), and so b’s 
contribution to f is 


wBe(bo) - (6.54) 


Similarly, we can determine whether or not the second case holds by checking whether or not 


Be(bo) > Fe(bo). In this case b C E7;(c), and b’s contribution to f is 


D(bi; ar, YL) re D(bo; an, YL) ) (6.55) 


where ©® is the stretched exponential CDF, as in the previous section, and az and yz are the 
parameters of the foreground/patch likelihood F¢/L,;. When neither the first case nor the 


second hold, we can determine the value of x under which the third case holds by using the 
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Figure 6.6: (a) The red curve shows cF@(6) as a function of 6 for feature 35 of the 180608.HS 
sequence, using optimal values for c, a, and yz, calculated as described in this section and the 
previous section. The discontinuous blue line segments show Bg(d) for the same feature. (b) A 
graph of f(c) for the same feature. f has numerous plateaus, which arise due to the fact that 
cFg does not always intersect Ba. 


inverse stretched exponential @1: 


e*(ysa,7) = (-» (ee@)))" (6.56) 


The section of b below x is in E7(c), and the upper section is in F7;(c), so b’s contribution to 
f is given by 


(x — bo) Bobo) + (b1; a2, yz) — O(a; a2, 72) - (6.57) 


f is a continuous function as, informally, small changes in c always induce small changes in 
E;(c) and E7;(c), and the integrals of Fg and Bg vary continuously with the integration limits. 
However f is not differentiable everywhere, and it will often have many plateaus separated 
by extremely sharp transitions (e.g. see figure 6.6), and it may even have local minima, so 
gradient-based optimisation methods are not appropriate. Instead, we approximately minimise 


f using a variant of the dichotomous line search method that we described in section 6.4.2. 


To begin with, we must calculate the bounds of a range [c‘,c"] over which to search for the 
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minimum. Despite the possibility of local fluctuations in f’s value, its overall form seems to 
be the same for most features — it initially undergoes a rapid decrease in value over very small 
values of c, and then its value increases gradually. We can easily identify a value of c! within 
the range over which f rapidly decreases by simply selecting a value such that case 2 above 


holds for all bins. To do this, we calculate 


/ 


c= min x ; ; 
i, Ba(b++iw)>0 p(bt + 1W; Az, YL) 


(6.58) 


and then set ct to some value in (0,c’). The precise value is unimportant, and we arbitrarily 
choose a power of 2 that is < c’. We then need to identify a value of cl that is large enough 
for the search range to contain the minimum. We set the initial candidate value c” of ct to ct, 
and then perform an exponentially growing iterative search, each iteration of which involves 


wnt 


evaluating f(c’), keeping track of the candidate value c’” at which f was minimised, and then 


a) 


replacing c’ with 2c’. We terminate the search when f(c’) > kf(c’") for some k > 1 (we found 


k = 2 to be sufficient), and set cl = ec”. 


We then search [c!, ct] for the minimising value of c much as before, with the exception that we 
cannot base our decision of which half of the search space to discard in each iteration on the 
values of f near the midpoint, due to the plateaus. So instead, we calculate f at m uniformly 
distributed values of c in each half, and discard the half that does not contain the minimum 
value of f over these sample points. We found m = 4 to be a sufficient number of sample 
points. We terminate the search either: when the maximum difference between f’s values at 
any pair of the sample points is no greater than the product of its maximum value at a sample 
point and a threshold 7; or when the number of iterations exceeds some upper bound. We used 


Tt = 1E™ and allowed up to 100 iterations. 


6.5 Some Methods for Restoring Lost Particles 


Now that we have described the methods by which we estimate all of the parameters of our 


particle loss test, the final issue to be dealt with is the restoration of particles that our test 
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Time 0 


Time t — 1 


Figure 6.7: The green and red particles are to compare the blue time t — 1 subregion to time 
t subregions using image information from time 0. As the time t — 1 states of the particles 
are very different, the subregions of the first frame from which they will retrieve information 
(indicated by the small green and red squares) are also very different. 


considers to be lost. The simplest method that we have tried is a resampling approach, in 
which the lost particles olin are resampled from the particles ell that are not marked as 
lost, in such a way that the resulting set of m+n particles has the same weighted distribution 


Ol where 


as olen), To do this, we first of all associate each lost particle 6! with a particle el 

f() € {1,...,n} is a random variable that takes a value j with probability proportional to the 

weight wil of the j** non-lost particle. After sampling all m values of f, we replace each 6! 

with ef and count the frequency \(j) with which f took value 7, for each 7. We then set the 
f wf 


weights of particle oy and the A(j7) lost particles that were replaced with it to XG)EL: 


While resampling has the expected effect of moving lost particles to states of high posterior 
probability, the resulting set of particles cannot be any more diverse than the original set of 
non-lost particles. Furthermore, resampling cannot be carried out at all when all particles are 
lost. An alternative particle restoration technique that does not suffer from these problems is 
to use information from earlier frames to define the importance sampling distribution @;. If 
information is used from a frame in which the state of the patch is known to a high degree of 


certainty, the cause of particle drift can potentially be avoided. 


The frame in which the patch’s state is known to the greatest degree of certainty will of course 
always be the first frame Zo, in which we define its initial state. As discussed earlier, we 
normally define ¢; by combining information about the differences Ay 4[5(g), S(g) +d] between 
subregions S(g) of image Z;_, and S(g) + d of image Z;. So to make use of the first frame in 


the definition of ¢; for particle el we can use the projection of the image data from subregion 
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M(S(g); el, e) of image Zp into subregion S(g) to calculate Ay ;, instead of the information 


from Z;. 


Unfortunately, this restoration method has two significant limitations. Firstly, as shown in 
figure 6.7, the use of the first frame makes it possible for two different particles to end up pro- 
jecting different parts of Zo into S(g), which prevents us from caching the discrete, bounded- 
displacement local likelihood energy functions me for later reuse. This causes a large perfor- 
mance penalty, particularly when all of a patch’s particles have been labelled as lost. E.g. 
under the test data and parameters that we used for the tests that we will describe in the next 
section, using this method to restore 50 lost particles for each patch meant spending a few 
minutes to process each patch, rather than the 5-10 seconds that are spent when each will is 
cached. This limitation may be difficult to totally overcome, but the methods we will propose 


in the concluding chapter for improving the importance sampler’s efficiency may help. 


The second limitation is that this use of the information from the first frame is only valid when 
the non-translational components of the patch’s state at time t are small. When this is not the 
case, the displacement that each wll assigns minimum energy to may still be close to correct (if 
the rotational component is not too large), but these minimum energy displacements will not 
be uniform over all we, and so multiplicatively combining the local likelihoods may not lead to 
an importance sampling distribution that peaks close to the correct state. The implementation 


of solutions to this problem is one of the areas we intend to focus on in future. For now, we 


will briefly discuss some ideas that may lead to solutions. 


In order to solve this limitation, we need to make an adjustment to our current Product-of- 
Local-Likelihoods model, which relies on the assumption that the time t — 1 particle states are 
not far from the true time t patch state. The simplest adjustment to make would be to still use 
the first image, but to replace the single Gibbs sampling iteration that we use to approximately 
sample from ¢; (as discussed in section 3.3.3) with multiple iterations. Unfortunately, it is not 


clear how many iterations would be needed to achieve an adequate degree of convergence. 


Another alternative is to redefine the distributions that we sample each transformation com- 


ponent from in a way that breaks the cycles of dependencies between them, so that sampling 
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a state for the n** component does not change the information that we used to define the 
distributions that we sampled the first n — 1 components from. Distributions like this only 
permit information to “flow forwards”, hence rendering multiple Gibbs sampling iterations 


unnecessary. 


One way to create such distributions would be to define them in such a way that the probability 
they give to any given state of the current transformation component is a strictly-decreasing 
function of the uncertainty that remains over the states of the transformation components that 
are yet to be sampled. E.g. suppose we were given a particle state el and energy functions 
wild; g) that use the first frame to define the likelihood energy of the event of a subregion 
centred at point g undergoing a displacement d. Furthermore, suppose we were only interested 


j] 


dais of the particle at time t. Rather 


in estimating the orientation ol and displacement C} 
than sampling the particle’s displacement first and then sampling its orientation, suppose we 
started with an estimate ¢@ of its orientation, and let ©’ be an intermediate particle state with 
displacement a. es and orientation ¢. We could define a distribution fp over the space 
D of displacements as normal, adding up the local likelihood energies of the transformation 


induced by ©’ l when it is displaced by each d € D. So the uncertainty that would remain 


after assuming the particle’s orientation to be ¢ is given by fp’s entropy: 


p(d) =— Y= fr(d)in fo(d) . (6.59) 

dep 
For an accurate choice of ¢, fp would be close to a Delta function (if the magnitude of any non- 
rigid transformation was small), and so p(@) would be close to zero. Under an inaccurate choice 
of ¢, there would be little agreement between the energy terms that each subregion contributes 
to any hypothesised displacement, and so p(¢) should be large. Hence, we could define an 


importance sampling distribution over ells orientation as a strictly-decreasing function of 


p(¢). 


The problems with this approach however are that it would be computationally expensive to use 
when estimating more than two transformation components, and the distributions for all but 


the last sampled component would fail to preserve the exponential rate of growth in sampling 
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probability ratios w.r.t. the number of subregions, so they would have greater uncertainty. 
However this method could potentially be used to initialise the Product-of-Local-Likelihoods 
Gibbs sampler with an improved estimate of the particle’s orientation, in the hope of reducing 


the number of iterations that are needed. 


6.6 Results 


A direct validation of the results of the methods we have discussed in this chapter for estimating 
the likelihood function parameters and the test threshold would involve comparing the results 
we obtained to those we would have obtained if we had based our construction of the D,. 
difference sets on the true deformations that the myocardiums underwent over thousands of 
frames. Manually tracking ~ 30 landmarks over thousands of frames for each patient in order 
to calculate this many deformation fields is an impractical task, so we are not in a position to 
carry out this form of validation. So instead, we will analyse the particle filter’s performance 
under our estimates of the likelihood function parameters by using the deformation fields that 
we do have to validate the sampled particle states. In addition, we will attempt to validate 
the accuracy of our loss test (and hence our test threshold selection method) by comparing the 
errors in the states of the lost particles to the errors in the states of the particles that are not 


labelled as lost. 


For each video sequence, we used this chapter’s methods to estimate the parameters for patches 
initially centred at 14 of our manually placed landmarks, and we tracked them over all frames 
that the deformation fields were defined in, so that we could validate the results. We used the 
resampling approach to restore lost particles whenever a patch had at least one non-lost particle, 
and left them to drift otherwise. The random perturbation bounds and the other parameters 
we used to simulate the patch difference sets are summarised in table 6.1. Screenshots of the 
results are shown in figures 6.9 and 6.10, and graphs of the trajectories of the weighted means 


of the particles are shown in figure 6.11. 


The fact that we only ever displace the subregions used in the calculation of the local likeli- 
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Parameter Value 
Dy D,/Dr Dz 
Max. orientation perturbation 3° 5° 16° 
Max. skew perturbation 0.0375 0.075 0.2 
Max. scale perturbation 1.0375 = 1.075 1.25 
Max. displacement perturbation 3p 5p V2 x 40?p = 56.6p 
No. deformation trajectories 5 10 10 
No. frames per deformation trajectory 50 50 50 
No. perturbations per frame 1 10 180 


Table 6.1: This table summarises the parameters that we used to define the distributions of 
the perturbations that we applied when constructing the D, set of patch/subregion differences. 
As always, the unit p stands for pixels. 


hoods means that the dissimilarity measures that we calculate between pairs of subregions are 
always based on the comparison of pixels that do not quite correspond to each other, so it was 
unnecessary for the perturbations used in the construction of Dy to be as large as those used 
in the construction of D;/D,. Furthermore, since each patch contains multiple subregions (we 
gave percentiles of the distribution of the number subregions per patch in our earlier tests in 
table 3.2 on page 72), we did not need to apply multiple perturbations per frame when con- 
structing Dy. We used many more perturbations per frame in the construction of Dg than 
in the construction of D;/Dr because the range of perturbations for Dg was bounded by the 
importance sampler’s range of transformations, and hence was much larger. This range of 
transformations and the other particle filter parameters that we used were exactly the same as 
those listed in table 3.1 on page 72, with the exception that we changed the outlier cutoff p 
for the € dissimilarity function used in the evaluation of the patch likelihood functions L; from 
0.75 to 0.99. The value of this parameter did not matter in that chapter, but for the analysis 
we are about to present, we found that the greater degree of discrimination given by a larger 


cutoff led to better results. 


For each initial patch state @9, we used the deformation fields to calculate estimates ©0'9.7 of 
the patch’s trajectory, using the method we described in section 6.4.2. We then calculated the 
RSK decomposition of the estimated state at each time step, which allowed us measure the 


errors in each individual component of the trajectories of each particle of. 
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(a) 230108.FO1 (b) 180608.HS 


Figure 6.8: These figures show the ID numbers that we used to refer to each of the patches 
that we tracked. 


In order for us to be able to compare the errors across different components, we calculated 
Euclidean errors between vertices, rather than directly calculating the errors in the state variable 
components. More specifically, suppose, for example, that we wanted to calculate the rotational 
error in particle ell, First of all we defined a transformation state © by setting O14) = ore J 
and all other components to the corresponding components of ©’;, and then we considered 


nan 


the set of Euclidean errors between the four vertices of V(©’,) and those of V(©), and also 
between the estimated patch centres d(®’;) and d(®). We calculated the skewing and scaling 
component particle errors similarly. However for the displacement component, we calculated 
the Euclidean error between the particle centre d(o!") and the time ¢ position of the manually- 
tracked landmark, which is a more accurate reference than the deformation field. In addition, 


we also calculated the overall particle error as the set of errors between V(oll)’s vertices and 


V(0';)’s, and between d(o!") and d(@’,). 


6.6.1 Loss Test Results 


Let ie be the set of overall errors for the time t state of the i‘ of the 50 particles we used for 
a particular video. To test how well the loss test performed, we analysed the E sets per frame 


and per patch. 
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(c) Frame 47 


(ec) Frame 57 (f) Frame 62 (g) Frame 67 (h) Frame 72 


(i) Frame 77 (j) Frame 82 (k) Frame 87 


Figure 6.10: Screenshots of our particle filter’s performance over the 180608.HS video, taken at 5 frame intervals. The quadrilaterals show 
the weighted means of the particles, and the mostly-blue crosses near their centres show the positions of the particles. 
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Figure 6.11: These graphs show the trajectories of the weighted means of the particles in each 
video. The circular markers denote the positions of the weighted means at 5-frame intervals. 
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Figure 6.12: (a), (b) These figures show the changes in the distribution of the number of lost 
and non-lost particles over time. The numbers are always multiples of 50, which is the number 
of particles we used. This is because we resampled the lost particles whenever possible, meaning 
that either all of a patch’s particles were lost after processing each frame, or none were. (c), 
(d) These figures show the numbers of lost and non-lost particles per patch, accumulated over 
all frames that they were tracked over. Again, the values are all multiples of 50. 
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Figure 6.13: (a), (b) Error percentiles per frame after partitioning particles according to 


whether or not they were lost. 


(c), (d) Error percentiles per patch after partitioning parti- 


cles according to whether or not they were lost. A pair of dashed vertical lines is given for each 
patch. The lines on the left mark the percentiles for lost particles, and the lines on the right 
mark the percentiles for non-lost particles. Patch 69 in 230108.FO1 only has one vertical line, 
because the loss test erroneously labelled all of its particles as lost. 
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For the per frame analysis, we constructed sets 
Et 2D EM ow aT (6.60) 
of errors associated with particles that were considered to be lost at each time t, and sets 
BE, DEM owie1 (6.61) 


of errors for particles that were not considered to be lost at time ¢t. Under an ideal loss test, 
we would expect the overall errors of the lost particles to be larger than those of the non-lost 
particles, so we have plotted percentiles of each of these sets as functions of t, to see the extent 
to which this was true. The results are shown in the top two graphs of figure 6.13, and the top 
two graphs of figure 6.12 show the total number of particles that were/were not lost in each 


frame. 


For 230108.FO1, we see that for almost all time steps between frame 20 frame 59 (the final 
frame), 75% of the error terms from lost particles (i.e. the errors above the 25‘ percentile 
line) were greater than 50% of the error terms from non-lost particles. Over the same range of 
frames, we also see that 50% of the lost particle errors were greater than 75% of the non-lost 
particle errors. Note also that the numbers of lost and non-lost particles were roughly equal 
over 75% of these frames. There was less difference between the lost and non-lost particle errors 


over the first 20 frames, but there were also fewer lost particles over these frames. 


A large proportion of particles were labelled as lost around the middle of the 180608.HS se- 
quence. This was a result of the large degree of myocardial deformation and the large change 
in surface inclination around the middle of the sequence (as shown in figure 5.3 on page 123), 
which demonstrates the limitations of our reliance on the initial image as a reference for our 
loss test. Over most of the sequence, 50% of the lost particles had errors greater than 50% of 


the non-lost particles. 
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For the per patch analysis, we constructed a set 
Eo,1 DE oe Of =0, AW aT (6.62) 


for each initial patch state © 9, made up of the overall errors associated with the patch’s lost 
particles, and a set 


Eo,,1 2 Ef! 4 Of) =O A Wi = 1 (6.63) 


of the errors associated with its non-lost particles. We have plotted some of the percentiles of 
each of these sets in the bottom two graphs of figure 6.13, along with the number of lost /non- 
lost particles per feature in the bottom two graphs of figure 6.12. For the 230108.FO1 sequence, 
most of the patches either had far more lost particles than non-lost particles, or vice versa. This 
means that either the lost particle errors were much more densely sampled than the non-lost 
particle errors, or vice versa, and so it is not fair to compare the percentiles for these patches. 
For the patches that had a more even distribution of lost/non-lost particles, namely patches 58, 
60 and 78, we see that the percentage of lost particle errors that exceeded 50% of the non-lost 
particle errors varies between 50% and a little under 75%, which is not too far from the per 


frame results above. 


For 180608.HS, a larger number of patches had more balanced distributions of lost /non-lost 
particles. Of these patches, patch 40 had the best separation between lost particle errors and 
non-lost particle errors, with 75% of the lost particle errors being greater than 75% of the 
non-lost particle errors, and patches 41, 51 and 58 were next in line with 75% of the lost 
particle errors being ~ 2 or more pixels greater than 50% of the non-lost particle errors. For 
the remainder (i.e. patches 34, 36, 37, 49, 57), the percentages of lost particle errors greater 


than 50% of the non-lost particle errors were somewhere between 50% and 75%. 


On the face of it, these figures seem to suggest that there is much room for improvement in 
the loss test’s classification accuracy. However a significant limitation of this analysis is that it 
does not take into account the ambiguities in the states of some features; some of the patches, 


such as patch 78 in 230108.FO1 or patch 49 in 180608.HS, could slide around without their 
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appearance changing very noticeably. In the absence of strong additional (prior) information, 
it is not possible for the loss test or the posterior distribution to discriminate between these 


ambiguous states. 


6.6.2 Particle Filter Accuracy 


While the previous section’s figures give a suggestion of how well the loss test performed, they 
are not indicative of the particle filter’s accuracy, as they do not take the particle weights 
into account. So for each type of error that we measured (the overall errors, and the errors 
for each transformation component type), we constructed pairs (EM, wl!) of particle error sets 


and particle weights, gathered these pairs together per patch over all frames, and then plotted 


percentiles of the resulting weighted distributions. 


As shown at the top of figure 6.14, for 8 of 230108.FO1’s 14 patches (those with IDs 58, 59, 
60, 61, 62, 67, 68 and 70), the 75" percentiles of the overall weighted errors lay between 10 
and 20 pixels, and for 13 of 180608.HS’s 14 patches (all except for patch 51), the 75 overall 
error percentiles lay between 13 and 20 pixels. The displacement errors for these patches were 
generally smaller than this, with the first 6 of the 230108.FO1 patches listed above having 
75" percentiles within about 8 pixels, and 11 of the 180608.HS patches mentioned above (all 
except for patches 41 and 42) also having 75‘ percentiles within about 8 pixels. The rotation 
results were similar: the 75‘ percentiles for 6 of the 230108.FO1 patches listed above were 
within 8 pixels, and the same percentiles for 10 of the 180608.HS patches mentioned above 
were also within about 8 pixels. The scaling results were not as good however. Only patch 
58 in 230108.FO1 had a 75 percentile close to 8 pixels; the 75" percentiles of the other 7 
patches listed above were between 10 and 14 pixels. For 180608.HS, only 3 patches had 75'* 
percentiles less than 8 pixels; the 75'" percentiles of the other 10 were between 10 and 21 pixels. 
The skewing results were better than the scaling results, but not as good as the displacement 
and rotation results: 2 of the 8 230108.FO1 patches listed above had 75‘ percentiles within 8 
pixels, while the 75" percentiles of the remainder were between 9 and 13 pixels; 5 of the 13 


mentioned 180608.HS patches had 75‘ percentiles that were less than 8 pixels, while 7 of the 
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remaining 75" percentiles lay within 9 and 14 pixels. 


The overall errors and displacement errors for patch 51 in 180608.HS were worse than those of 
the other patches due to the patch being occluded by some loose tissue on the right-hand side 
towards end of the sequence (see frames 77-87 of the screenshots in figure 6.10). The 6 patches 
of 230108.FO1 that we did not list above (patches 64, 69, 71, 72, 78 and 84) performed worse 
for similar reasons, e.g. specular highlights occluded some of the textural details that patch 
69 was tracking, and some of patch 71’s textural detail was lost when it got too close to the 
image boundary. Despite the overall errors for these patches being worse, it is interesting to 
note that some of their transformation component errors were small, presumably as a result 
of the smoothness of the myocardium’s deformations. E.g. the 75'* percentile of patch 51’s 
skewing errors in 180608.HS was about 4 pixels, and the same percentile of patch 64’s rotation 


errors in 230108.FO1 was about 6 pixels. 


6.6.3. Comparative Tests 


We performed two comparative tests with each video: one to see the effects of only estimating 
the displacements of the patches, rather than full affine transformations; another to compare 
the performance of our particle filter to that of a standard tracking algorithm from the liter- 
ature. For the former comparison, we carried out the parameter estimation procedures that 
we have described in this chapter as normal, except that we set all non-translational transfor- 
mation components to the identity transformation, so that the transformations involved in the 
construction of the D, image difference sets were all displacements. For the latter compari- 
son, we used the OpenCV implementation of the pyramidal KLT tracker (p-KLT) described 
in [14] (with 3 pyramid levels), which augments the standard KLT tracker |[66, 112, 99] with a 


coarse-to-fine scheme, so as to allow for the estimation of large displacements. 


It is interesting to compare the particle filter’s performance when estimating full affine trans- 
formations to its performance when only estimating displacements for two reasons. Firstly, it 
is interesting to see how well our particle filter copes under the simplest useful transformation 


model. Secondly, since the space of displacements is small enough for our importance sampler 
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Figure 6.14: These figures show weighted percentiles of the overall patch errors and displacement 


patch errors. 
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Figure 6.15: These figures show weighted percentiles of the rotational and scaling patch errors. 
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Figure 6.16: These figures show weighted percentiles of the skewing patch errors. 
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to be able to sample from it directly, without having to resort to Gibbs sampling (as it does 
when sampling affine transformations), it is interesting to see whether or not the errors that 
result from the fact that we only carry out a single Gibbs iteration have an adverse effect on 


the estimation of the displacement component when using full affine transformations. 


Figure 6.17 shows the weighted displacement error distributions for the first test. Most of the 
percentiles we have plotted differ by no more than 2 or 3 pixels, suggesting that a single Gibbs 
iteration was usually sufficient to estimate displacements when using affine transformations, and 
also that in most cases our particle filter was able to cope with the appearance changes caused by 
non-translational deformations when we were only estimating displacements. Interestingly, the 
patches of 230108.FO1 for which there are large differences between the two tests are mostly the 
6 patches that had large overall errors in the results that we presented in the previous section. 
With the exception of the 95'" percentile of patch 69, the error percentiles for patches 64, 69, 
72 and 84 were smaller when we tracked full affine transformations, but the error percentiles 
for patches 71 and 78 were smaller when we only tracked displacements. The 75'* and 95'* 
percentiles for patch 51 in 180608.HS (which had the largest overall errors in that video at the 
95‘ percentile) were also much larger when we tracked full affine transformations, as was the 


95" percentile of patch 41. 


Figure 6.18 shows the results of our comparison of our particle filter with the p-KLT tracker. For 
each frame of the videos, the results show percentiles of the distribution of displacement errors 
over all patches, measured with reference to the manually tracked landmarks (for our particle 
filter, we compared the landmarks to the weighted mean particle displacements). We ran the 
p-KLT tracker in two different ways: in one run, we used the first frame as its reference image 
and centred its reference patches over the landmarks in that frame, and in the other, we used 
the previous frame as its reference image and centred the reference patches over its estimates 
of the patch positions in that frame. In both cases, we took the most recently estimated patch 
positions as the initial estimates of their new positions. To make the comparison fair, we set 
the patches that we used with the p-KLT tracker to roughly the same size as the patches that 
we used with our particle filter (they could not be exactly the same size, as the p-KLT tracker 


required the dimensions to be odd, so we used 101 x 101 patches for 230108.FO1 and 81 x 81 
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(b) 180608.HS 


Figure 6.17: These figures show some percentiles of the weighted displacement error distri- 
butions for each patch in the tests in which we compared the particle filter’s performance 
when estimating full affine transformations to its performance when only estimating patch dis- 
placement. The left-hand column of each patch’s results gives the error percentiles from the 
tests in which we estimated full affine transformations, and the right-hand column gives the 
displacement-only percentiles. 
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patches for 180608.HS, as opposed to the 100 x 100 and 80 x 80 patches that we used with our 


particle filter). 


The p-KLT tracker generally performed better when we used the previous frame as a reference 
rather than the first frame. When using the first frame, it performed particularly poorly on 
the 180608.HS video, declaring that it had lost all of the patches within the first 15 frames. 
But even when using the previous frame, the 25" percentiles of its errors were larger than our 
particle filter’s 75'" error percentiles in almost every frame, meaning that over 75% of its errors 


were larger than 75% of our particle filter’s errors most of the time. 


6.7 Conclusion 


In this chapter, we have described a way of estimating the parameters of our particle filter’s like- 
lihood functions by using the DTMs we described in the previous chapter to simulate deforma- 
tions of the regions around each patch, from which we construct sample sets of patch/subregion 
differences that we pass to a maximum-likelihood parameter estimator. We have also described 
initial solutions to the problems of determining when particles are irretrievably lost (based on a 
background/foreground likelihood ratio test), and restoring lost particles (based on resampling 
from the set of particles that are not considered to be lost). Our tests so far seem to indicate 
that more work needs to be done to improve the classification accuracy of our particle loss test, 


e.g. by taking into account recent information about the appearance of the patch. 


We have presented test results that suggest that our particle filter estimates the displacements 
and orientations of patches reasonably well, but that it struggles to estimate their scale and 
skew. It may well be that the patches do not contain enough information to estimate scale 
and skew accurately, so taking into account the dependencies between the patches seems like 
the most promising way of improving the estimation accuracy. Our comparison of our particle 
filter with the pyramidal KLT tracker ([14]) suggests that the latter is incapable of achieving 
the degree of displacement estimation accuracy over myocardial image sequences that we have 


achieved through the techniques laid out in this thesis. 


Chapter 7 


Conclusion 


7.1 Summary of Thesis Achievements 


In this thesis, we have described methods that we have designed and tested as part of our 
research into reliable approaches by which the motion of the myocardial surface can be esti- 
mated, for use in image-guided surgery. Our methods were built upon a Sequential-Importance- 
Sampling-based particle filtering framework (with generalised dependency assumptions) for 
tracking affine changes in the states of surface patches, with an emphasis on information that 


can be extracted from the images in the absence of explicit prior assumptions. 


In chapter 3, we introduced a Product-of-Experts-inspired model of the likelihood component of 
a patch’s posterior distribution. We showed that the uncertainty of the event of a subregion un- 
dergoing a given transformation can be reduced by multiplicatively combining estimates of the 
likelihood of that transformation for each subregion (as evaluated by local likelihood functions), 
and that this idea could be combined with Gibbs sampling to define low-variance importance 
sampling distributions. We also described conditions under which it seems reasonable to assume 
an exponential rate of growth in the ratio between importance sampling probabilities, and we 
validated this assumption empirically. This allowed us to reformulate the likelihood product 
approach in a way that accounts for the common event of subregion likelihood terms being 


unobservable. The reformulation expressed the importance sampling energy as the product of 
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the mean energy of the observable likelihood terms and an order of magnitude term. 


In chapter 4, we described the dissimilarity function that we use to define the likelihood func- 
tions from the previous chapter. We treated the problem of measuring the dissimilarity between 
two patches as one of summarising the weighted distribution of squared differences between 
them, and we proposed the use of the Earth Mover’s Distance between the bottom 100p% of 
this distribution and the Dirac delta function as a summariser, with a degree of sensitivity to 
outliers controlled by p. To make the dissimilarity function invariant to appearance changes 
caused by changes of illumination, we devised a simple image filtering algorithm based on a 
variant of the Dichromatic Reflection Model, that uses parametric surfaces to separate local 
lightness variations caused by albedo variations from those caused by illumination geometry 


variations. 


Our aim in chapter 5 was to develop low-dimensional, periodic, generative models of affine and 
non-affine myocardial deformations, to assist in our attempts to automatically parameterise 
the particle filter’s likelihood functions. We constructed training data sets by fitting Thin- 
Plate-Splines-based deformation fields to sequences of manually tracked landmarks, randomly 
sampling a set of quasi-uniformly distributed similar canonical grids that lay within the images 
of the deformation fields, deforming them with the fields and removing the affine components of 
the resulting grids. This allowed us to create PCA models of the main modes of non-affine grid 
deformation much like the well-known Point Distribution Model. We then used these models to 
form sequences of reduced-dimensionality representations of the non-affine grid deformations, 
to which we appended log-space representations of the affine transformations, and produced 
new PCA models of the distribution of periodic B-spline approximations of these sequences, 


which we refer to as Deformation Trajectory Models (DTMs). 


Finally, we introduced an idea for testing whether or not a particle is “lost” (ie. whether 
or not it has fallen into a region of low marginal posterior probability) by comparing “fore- 
ground” and “background” likelihood functions of its state in chapter 6. Using our DTMs, 
we were able to construct sets of patch/subregion differences, from which we could calculate 


maximum likelihood estimates of: the parameters of these likelihood functions, the parameters 
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of the importance sampler’s local likelihood functions, and the test threshold. The background 
likelihood function was simply represented as a smoothed histogram, and we were able to use 
standard gradient-based optimisation methods to estimate the remaining likelihood function 
parameters under the assumption that these functions were stretched exponentials. The objec- 
tive function for the test threshold was not differentiable everywhere however, and it usually 
had numerous local minima, so we developed a modified version of the dichotomous search 
algorithm to minimise it. In addition to this, we also described some of our initial attempts at 


restoring lost particles by resampling them or using information from the first frame. 


7.2. Future Work 


Improvements to the efficiency and accuracy of our particle filtering system must be made before 
it will be suitable for image-guidance in a clinical setting. The main bottleneck at present is 
the calculation of the local likelihood functions. All of the tests we have done in this thesis have 
involved evaluating local likelihood functions over 41 x 41 grids, with a median of about 50 
subregions per patch, which means that the lower bound on the median number of dissimilarity 
function evaluations per patch is about 50 x 41? = 84050. Caching helps to prevent this lower 
bound from growing much larger when the particles are in close agreement. But when caching 
cannot be used, this lower bound grows by a factor of the number of particles, which was 50 in 


our tests; 50 x 84050 = 4202500 dissimilarity function evaluations. 
We calculated the local likelihoods at every point in the grid for simplicity. However, we should 


be able to significantly reduce the amount of work by making use of the following observations: 


a) local likelihood functions are quite spatially smooth locally, due to the smoothness of 


appearance of most parts of the myocardial surface; 


b) local likelihood functions for similar-looking neighbouring subregions are often similar in 


appearance, up to a coordinate system translation; 
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c) atime t+1 local likelihood function for a particular subregion often looks similar to a time 


t local likelihood function for a subregion in an approximately corresponding position. 


It may thus be possible to achieve good estimates of the local likelihood functions by evalu- 
ating the dissimilarity function at a well-selected sparse set of points, and then predicting the 
likelihood values everywhere else. So one aspect of our future research will involve investigating 


effective methods for selecting the evaluation points and carrying out the predictions. 


Another important observation is that all values of all local likelihood functions for all particles 
can be calculated independently of each other. So we may also be able to gain a significant 


speedup by exploiting GPGPU parallelisation and/or multicore CPUs. 


While we were able to greatly reduce the effects of the aperture problem through the use of 
the Product-of-Experts framework, it did cause problems when we were tracking patches with 
little texture, and patches that lay on vessels that were locally nearly straight. It seems that 
the only way to resolve this issue will be to take into account the dependencies between all 
of the patches we are tracking, e.g. by extending our Product-of-Experts approach to operate 
over all patches simultaneously. This would be a non-trivial extension to our current approach 
however, as we cannot assume that a single global affine transformation will account for the 
motion of all patches. If we can rebuild our deformation mode models and DTMs over larger 
areas, it might be possible to use them to enforce prior constraints on the set of deformations 
that the patches can simultaneously undergo. But this will only lead to a statistically efficient 
solution if we can find a way to directly incorporate these constraints into the calculation of 


the importance sampling distributions. 


We also intend to extend our loss test to incorporate information from recent images, and to 
develop effective particle restoration techniques for the case when all of a patch’s particles are 
labelled as lost. If the methods we described in the previous chapter prove to be infeasible or 
unhelpful, another potentially interesting avenue to explore would be a statistical reformulation 
of one of the many feature detection algorithms that have been published in the computer vision 


literature. 
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Finally, before our methods can be applied to more general (i.e. non-surgical) image sequences, 
it will be necessary to incorporate some form of motion segmentation process into the calculation 
of the importance sampling distributions and the patch likelihood function. For the importance 
samplers, this could potentially be done implicitly by replacing the calculation of mean energies 
with an outlier-insensitive average, such as the median. A more explicit approach would be 


needed for the patch likelihood function however, so as to maintain its discriminative power. 


Appendix A 


Notational Conventions 


Symbol Page 
a, b, a, 3, etc. Column vectors 
ab The directional vector b—a 
axb The 2D cross product 128 
Aabc The triangle with vertices a, b and c 

abcd The quadrilateral with vertices a, b, c and d, which form a 


loop when connected in that order 
a,b,A,B,0,®, etc. Random vectors 
a,b, A, B, 9, ®, etc. Random scalars and any other type of random variable except 


for random vectors 


€ A random noise vector 

anf a is sampled from PDF/PMF f 

E{al The expected value of a 

E,|al The expected value of a under PDF p, ie. { ap(a)da 

a|b The random variable that results from conditioning a on b 


A, B, O, ®, etc. Matrices 
A’ The transpose of A 


A? The square root of A, i.e. a matrix B such that BB =A 
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a:9, where i> j 


[agd;-Gee.2.| 
[a, 0] 


The zero vector or matrix 

The m x n zero matrix 

The identity matrix 

The n x n identity matrix 

Indices i to 7 of x, where x is any indexable object 

An empty index list 

The list of items a,b,c,... 

The list of items a and }, or the closed interval {1 : a< ax < b}. 
We indicate whenever we mean the closed interval. 

Row i and column Jj, respectively, of A 

Rows 7 to 7 and columns & to 1, respectively, of A 

The Frobenius norm of A, defined as || All, = is A, 
The empty set 

Upper and lower bounds, respectively, of x 

The Kronecker Delta function, defined as 0; = 1 <7 = J, 

7 SUS te 

The affine coordinate transformation that transforms p by the 53 
inverse of the transformation defined by 6, and then trans- 

forms the result by the transformation defined by 6, 

The Gaussian distribution over x with mean vector je and 
covariance matrix 4 

The boundary of set X 

The unnormalised stretched exponential function of « with 53 
exponent a@ and inverse decay rate 7 

The four vertices of the patch after application of the trans- 42 


formation defined by 0 


Appendix B 


The After Math 


B.1 Calculating the Decomposition A = RSK 


The QR decomposition is one of the standard matrix decompositions in linear algebra, in which 
a matrix A is decomposed into the product of an orthogonal matrix Q and an upper-triangular 
matrix U (this matrix is usually referred to as R, but we will call it U to avoid confusion with 
the rotation matrix R), i.e. 


A=QU , (B.1) 


where @ and U are square matrices satisfying 
OOS 1,. Uva Vere 5 (B.2) 


Many numerical libraries with linear algebra support can calculate this decomposition, using 
methods such as Householder reflections, Givens rotations or the Gram-Schmidt process (see 
[40] for details). However for the 2 x 2 matrices we are dealing with, the decomposition can 


be calculated with simple closed-form expressions, minimising the overhead introduced by the 
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generality of numerical library functions designed to support matrices of arbitrary size: 


where 


A, 1Aji2 + Aoi Ao A |.A| 


A 2 2 A 
=,/A A b= deel 
a 117 41 > - » € 7s 
A A 
ga ys (B.4) 
a a 


The columns of Q are clearly orthogonal to each other, and the squared magnitude of each 


(Ax) (Ax) _ Ati + Ada = (B.5) 


= 2 2 
a a Aji a i Aj, 


column is 


hence Q is orthogonal. In fact |Q| = 1 as well*, so Q is the rotation matrix R. 


The correctness of U can be checked by comparing U to Qu! A: 


Q7A= al ZY es 1 At, + Abt Aj 1Ai2 + A21A22 (B.6) 
: —~Yy & V/ Aj, a Ab) 0 A, Ao» = Ay iAi2 
This clearly simplifies to the expressions given above for U’s parameters, as required. 
Finally, we must solve 
Sk=U , (B.7) 


where S is a diagonal matrix and K is upper unitriangular (i.e. it is upper triangular and the 


elements on its main diagonal are 1). The solution to this is trivial: 


S = diag(a,c) , 


K=S'U. (B.8) 


*When using numerical libraries to calculate the decomposition, Q@ may have a negative determinant, in 
which case its i*® column and the it row of U must be scaled by -1, for arbitrary i. 
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B.2 The Existence and Realness of 2 x 2 Matrix Loga- 


rithms 


[74] gives the only cases under which a 2 x 2 matrix has a real logarithm. In the case of our A(@) 


matrices (upon which we enforce the restriction |A(@)| > 0), the conditions are that either: 


1. A(@) has equal negative eigenvalues and is diagonalisable; 
2. A(6) is positive-definite; or 


3. A(0)’s eigenvalues are both complex. 


Condition 1 is true if and only if A(@) is a negative isotropic scaling matrix, i.e. if A(@) = —al, 


for some a > 0. 


Condition 2 is true if and only if tr(A(@)) > 0 (since |A(@)| > 0). Expanding eq. (3.2) and 


assuming |, = 0, it follows that this is equivalent to the condition 
(Asx) + Pfsy]) COS([4]) + FsxJMka] Sin(Ag]) > O (B.9) 


It is difficult to give an intuitive description of the geometric significance of this, but a special 


case of interest is that it is always true when 0j%,) = 0 and |6j4)| < 0.57 radians. 


Another way to think about the significance of tr(A(@)) > 0 is to use the visualisation shown 


in figure B.1, in which 


A= (B.10) 


is represented by a parallelogram that depicts its effect on a unit square with a vertex at the 
origin. The positive trace condition means that either the x ordinate of the bottom-right vertex 
of the parallelogram must be less negative than the y ordinate of the top-left vertex is positive, 
or that the y ordinate of the top-left vertex must be less negative than the x ordinate of the 


bottom-right vertex is positive. 
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Figure B.1: The vertices of the red unit square are left-multiplied by a matrix A defined as in 
the text, transforming the square into the blue parallelogram, which has area | A). 


For condition 3, we use the form of A given in figure B.1 again. By calculating A’s characteristic 


polynomial and using the quadratic formula, it follows that A’s eigenvalues X satisfy 


at+d+,/(a+d)? — 4(ad — bc) 
5 


\= 


(B.11) 


Therefore, the eigenvalues are complex if and only if the discriminant is negative, i.e. if and 
only if 
tr(A 
(a+ d)? — 4(ad — be) <0 > mA <~V/|A| . (B.12) 


In terms of figure B.1, the left-hand side of this inequality is the magnitude of the mean value of 
the x ordinate of the bottom-right vertex of the parallelogram and the y ordinate of its top-left 
vertex, and the right-hand side is the side length of a square that is equal to the parallelogram 


in area. 


B.3 Calculating the Discretisation Resolution Parame- 


ters 


In this section, we derive the parameters that control the discretisation resolution of the sets 
R, K and S that we introduced in section 3.3.3. Our derivations are based on a patch V(0)* 
with central point d(@). In all cases, the aim is to find a transformation parameter update 6, 


for transformation type x (which can be any of the transformation parameter labels used in eq. 


*Please see section 3.2 for a definition of this symbol and all other symbols that are not defined in this 
section. 
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(3.2)) such that the directional vector 
M(q';0,0** (52)) — a’ = A(O*"(5,))A™'(0)(q’ — d(0)) + d(0** (5) — (B.13) 


is a unit vector (i.e. 6, moves q’ one pixel), where q’ is the furthest point in V(0) from d(@), 
under some appropriate measure of distance, and 6**(4,) is the result of updating 6.) by dz, 


but keeping @’s other parameters the same. In particular, for rotation and skew: 
O** (62) = Ory + 5n , (B.14) 


whilst for scaling: 


OP(O Oia bn. 5 (B.15) 


Since we only need to calculate the discretisation parameters for rotation, skewing and scaling, 


we will always have d(@) = d(6**(6,)). So if we define 
q=q'-d(0) , (B.16) 
then our objective is to find 6, such that 


1 = ||A(@**(6,))A (0) — all” - (B.17) 


B.3.1 The Rotational Discretisation Parameter 4, 


Let q’ be the furthest vertex in V(0) from d(@), measuring distances with the Euclidean norm. 
The choice of q’ is arbitrary when more than one vertex is at least as far from the centre as all 


the others. 


The rotation update 64 that displaces q’ by 1 pixel satisfies: 


1 = ||R (6*°(d5)) R*@)a—all’ (B.18) 
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since S (0*%(5s)) = S(#) and K (6*°(6,)) = K(6). The matrix R (6*°(65)) R7'(8) is simply 


an anticlockwise rotation by 64, which we will denote by the rotation matrix R’. 


We then have 


1=||R’q-4q||’ 
= (q'R" — q')(R’'q-4@) 
= 2q'q-2q'R''g , (B.19) 


: T 
since R’” R’ = T. 
Hence 


1 [ cos(dg) —sin(dg) 
a Re 
sin(dg)  cos(dg) 


= q'q—4Q,(q,c0s(5g) — q, Sin(5s)) — G(x Sin(ds) + q, cos(5s)) 
= q'q(1 — cos(5g)) 
1 
= 0% = arccos (: — ia) (B.20) 


B.3.2. The Skewing Discretisation Parameters 6, 


The rotation matrix R(#) = R(dT**(6,.,)) has no effect on the magnitude of the displacement 
induced on any point by the horizontal skew update parameter 06,,,, So we can omit it from A. 
Since K(0) is the first factor of A(@) that is applied, let v be a vertex of V with maximal 
y ordinate magnitude, and define q’ = S(0)K(#)v + d(0) > q = S(0)K(@)v. The 4,,, that 


displaces q’ by one pixel satisfies 


1 = ||S(0)K (0 (5,..)) K 1(8)S"1(6)q — all” 
= ||S(0)(K(0) +. K’)v — S(0) K(x)? 


= ||S(0)K'o| , (B.21) 
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where 


me Sy 
K'= (B.22) 
0 0 


As the bottom row of K’ is zero, eq. (B.21) reduces to 


l= (Pisz}OnnVy) 
1 


= be. == 
As.]|Vy| 


(B.23) 


since we require 6,,, to be positive, as stated in section 3.3.3. 
By symmetry, if we define v’ as a vertex of V with maximal x ordinate magnitude, then 


a1 


ey, 
Ofsy|V"x| 


(B.24) 


B.3.3 The Scaling Discretisation Parameters 6, 


As in the skewing case, the rotation matrix R(@) = R(dT*(6,,)) has no effect on the magnitude 
of the displacement induced on any point by the horizontal scale update parameter 6,,, so we 
can omit it again. Defining q as a vertex of S(0)K(0)V with maximal x ordinate magnitude, 
and observing that Kk (0) = K(6**"(6,,)) and that in the absence of rotation, 6,, will only 


affect the x component of q, reduces eq. (B.17) to 


1 
since 0,, > 1, as stated in section 3.3.3. 
Exploiting symmetry once again, it follows that 
1 
ag ene (B.26) 
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where q” is a vertex of S()K(0)V with maximal y ordinate magnitude. 


B.4. Maximising the Importance Sampling Distribution 


Entropy 


Given an unnormalised discrete distribution f(x) with domain X, defined at x € Xqg C X and 


undefined at x € X, = X — Xq, we want to find an energy w, > 0 that maximises k~!f’s 


entropy H, where: 


We will begin by defining f’s unnormalised entropy, H’, as 


H'2—J$7 f(x)In(f(2)) , 


rex 


and making use of the identity: 


=k 'H'’+In(k), _ by eqs. (B.29) and (B.30) . 


(B.27) 


(B.28) 


(B.29) 


(B.30) 


(B.31) 
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From this, it follows that the derivative of H w.r.t. qW, satisfies 
dH _ Beene ks | dk yt gst dk 
dH’ dk dk 
gat 3 H4k> 
de aba TW, 
dH’ dk 
ae —(1—k'H’ B.32 
/ aa : ) ae 
Differentiating H’ w.r.t. Wy, gives 
dH' TG 6 a df (x) 
= — 
eee = ( ap, IO) + ay, 
xEx 
= |Xyl\(—ae™ (B.33) 
and since 
k= S> f(w)+|Xule™ , (B.34) 
LEX, 
differentiating k w.r.t. w, gives 
dk 
Se Xela ee Bao 
ap =~ Mle (B.35) 
The entropy is maximised when 
dH 
i, 
dk dH’ 
Lhe 1S = 
ar cane aa 
ag —|Xy|(1 74 KORE ee a —|X,|(1 a Cy : (B.36) 
If we let 
A= 7 fla) 
rEeXg 
B= So f(x) n(F(a)) . (B.37) 
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then this becomes 


—B+ [Xulvue = w 
A+ |X,Je 
=> -B= Ay, 
B 
~=-T: B.38 
= a (B.38) 


Finally, we need to check what non-negative value of wy, maximises the entropy when 4 is 


negative. Since & is always positive, replacing the = in eq. (B.36) with < and working through 
eq. (B.38) with these new inequalities tells us that ae <0 = y, > —4%, which means that 
HT is a strictly decreasing function of w, for all w, > 2%. Hence in general, the entropy is 


maximised by 


Wy = max (0 -3) (B.39) 


B.5 Maximising a Particle Set’s Entropy 


Let J,, be a set of particle indices such that for each 7 € Ja, the time t patch likelihood L,(o” 1) 
is undefined, and let Jy be the set of indices of particles with defined time t¢ patch likelihoods. 
Assuming, without loss of generality, that LD, has a maximum value of 1*, we want to find a 
patch likelihood energy w,, > 0 for each 7 € J, that maximises the resulting entropy H of the 


particle set. 

By eq. (B.31), H satisfies 

— Dies Oe ine!) — Djs, werd OP ein (w_ pu(OP)e**) 
Wier We Fe Dies, We rPe(OP”) 


ie (= wee ST vena) | (B.40) 


JET ETu 


H= 


0 


where w, is the unnormalised time ¢ particle weight (defined as in eq. (3.16)) of a particle 


* Whatever the true maximum value is, rescaling it to 1 makes no difference, since the weights will ultimately 
be normalised. 
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with index j € Ju, wl I is the normalised time t — 1 particle weight for each particle j’, and p; 


is the (uniform) time t prior. For convenience, we use the following substitutions: 


— S- win (w't") — S- wl p(Of) In (wil r(@?))) 


JETa JESu 
c= ow? d= SF weip(@") , (B.41) 
JETa JEIu 


from which we get 
—a — be?" + dyye~* 


ji 
c+ de 


tIn(c+de"™) . (B.42) 


By eq. (B.32), differentiating w.r.t. a, gives 


dH it abe Mie i)) 
dy, ctde- c+ de7v ) 
e% ((be- — dye?" + de“) (c+ de“) —d (a+ be — diye" +¢+ de~")) 


(--* — dib,e—* + de“ — de~* ( 


(c+ de)? 
_ e-¥(be — dey, — da) (B.43) 
(e+ de-#u)? , 
Equating the derivative to 0 then gives 
dH bc — ad 
Also, it is clear from the last line of eq. (B.43) that 
be—ad dH 
ie == 20, (B.45) 


de> Fae 


so the non-negative patch likelihood energy that maximises H always satisfies 


a) ve 


Appendix C 


Algorithms 


Function PropagateParticles (page 248) and its helpers below are a pseudocode implementation 


of the whole importance sampling algorithm that we proposed in chapter 3. 


Function CalcDeltaEMD (page 254) and its helper below are a pseudocode implementation 
of the dissimilarity function that we proposed in chapter 4, that calculates the Earth Mover’s 


Distance between a weighted histogram of squared pixel differences and the Dirac delta function. 
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Function PropagateParticles(@!"), wit!) 


Given the time t — 1 particle set, this function samples the time t particle states. 
Input : The time ¢t — 1 particles states ei" and weights win . 


Output: The time f particles states 9"! 


(H, G) := CaleGroups(Ol'") wit) 
// W will store the cached discrete, bounded-displacement energy functions. 
Uv :=0 
i D will store the cached displacement sampling distributions. 
D:=90 
foreach Group key k of H do 
if sContainsKey(D, k) then 
// Create and cache displacement sampling distribution. 
foreach Grid point g € G[k] do 
if ~ContainsKey(,g) then 
// Create and cache discrete, bounded-displacement energy function. 
Uz(-;g) := discrete, bounded-displacement energy function, constructed as 


in eq. (3.23). 
Vig) = C3 9) 
end 
end 
D{[k] := CalcDispDistr(G[A], V) 
end 


// Propagate particles 

foreach (i,0,w) € H|k] do 
// NOTE: each ref. to U denotes a new instance of a uniform random 
// variable sampled over {0,1) . 


oe!" = ©) 
d’ ~ Dik] 
OF del t+=d' + (U—0.5,U — 0.5) "64 


f := CalcRotDistr(G[k], wv, 0!) 

wnt 

Of t=o! + (U -0.5)65 

«! ~ CalcSkewDistr(G[k], &, 0!) 

of += kK! + ((U _ 0.5) Oke (U —_ 0.5) dy) > 


’ Kar Ky] 
S’ ~ CalcScaleDistr(G[k], U, oe!) 
Hs oie diag(de 2° 50 -02\S/ol! 


t,[Sx,Sy| 


t,|Sa Sy 


return ell 
end 
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Function CaleGroups(0"' 2 wt *) 


This function calculates particle hash keys and groups them together. 


Input : The time ¢t — 1 particles states eli , and weights wil 
Output: The particle groups as a hash table H, and the group grid points as a hash 


table G . 
// Group particles and store the groups in hash table H. 
Hi} 
for i:= 1 tondo 
k := key defined by grid squares containing particle patch vertices vie!) 
Hk] = ABU {Of we} 
end 
// Calculate grid points for each group and store them in hash table G. 
G:= 0 
foreach Key k of H do 


A:=0 
d:=0 
W :=0 


// Average group particles. 

foreach (-,O,w) € H|k] do 
A+=w log(A(@)) 
d+=wd(0) 


d 
6 := transformation parameters defined by d and RSK decomposition of A . 
Gk] := set of grid points contained in patch V(6) . 


250 Appendix C. Algorithms 


Function CalcDispDistr(G, V) 
This function combines discrete, bounded-displacement energy functions to construct a 
displacement sampling distribution. 


Input: A set of grid points G and a hash table W of the discrete, 
bounded-displacement energy functions for each point. 
Output: A displacement sampling distribution f over domain D, defined as in eq. 
(3.22). 
Y :=InitDistr(D) 
foreach Grid point g © G do 
1 := true if and only if g is the last element of G . 
foreach d’ € D do 
// Add energy W[g](d’) to those so far accumulated for state d’ . 
Y :=UpdateEnergy(W[g](d’), d’, Y,1) 
end 
end 
f :=ConvEnergiesToProbs(Y, D) 


return f 


Function InitDistr(X) 
This function initialises and returns a structure of temporary variables used in the calcu- 
lation of the sampling distributions. 

Input : A finite, discrete domain X. 

Output: A structure Y of initialised temporary variables. 


// YY. is a map from X to R that accumulates the sampling distribution energies 
// for eachxe X. 
Yw(X) :=0 


// Y. is a map from X to N that accumulates the number of defined energy 
// terms for eachx € X. 
Y.A(X) :=0 


// Yn, is used to count the number of states in X with no defined energies. 
Yin, = 0 


// Y.Ny is used to count the total number of defined energies, over all states. 
YN =0 


// Y ab‘ is used to keep track of the smallest mean energy so far assigned to a 
// state in X. 
Y.abt := 00 


return Y 
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Function CalcRotDistr(G, V, ©) 
This function combines discrete, bounded-displacement energy functions to construct a 
rotation sampling distribution. 

Input: A set of grid points G, a hash table W of the discrete, bounded-displacement 


energy functions for each point, and a particle state O. 
Output: A rotation sampling distribution f over domain R, defined as in eq. (3.29). 


A 


Y :=InitDistr(R) 
foreach Grid point g © G do 
1 := true if and only if g is the last element of G . 
foreach ¢' € R do 
Of = 0 
OT += ¢ 
d:= M(g;9,0’) —g 
w := result of bilinearly interpolating Wg] at d,or Lifd¢D. 
// Add energy w to those so far accumulated for state @' . 
Y :=UpdateEnergy(1, ¢’, Y, 1) 
end 
end 
f :=ConvEnergiesToProbs(Y, R) 


return f 


Function CalcSkewDistr(G, , ©) 
This function combines discrete, bounded-displacement energy functions to construct a 
skew sampling distribution. 
Input: A set of grid points G, a hash table W of the discrete, bounded-displacement 
energy functions for each point, and a particle state 0. 
Output: A skew sampling distribution f over domain K, defined as in eq. (3.31). 
// This function is almost identical to CalcRotDistr. The only difference is that K 
// is used instead of R, and the line in which O'14) 1s updated should be replaced with: 
vas Ol ict += kK! F 


// where «! € K is the inner iteration variable. 


Function CalcScaleDistr(G, U, ©) 
This function combines discrete, bounded-displacement energy functions to construct a 
scale sampling distribution. 
Input: A set of grid points G, a hash table W of the discrete, bounded-displacement 
energy functions for each point, and a particle state O. 
Output: A scale sampling distribution f over domain S, defined as in eq. (3.32). 
// This function is almost identical to CalcRotDistr. The only difference is that S 
// is used instead of R, and the line in which ©'i4) is updated should be replaced with: 
[1 O'\s0,54) = S’O''s2,sy) 


// where S’ € S is the inner iteration variable. 
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Function UpdateEnergy(w, x, Y,/) 
This function updates the accumulated energies for a sampling distribution. 
Input: A new energy value ~ to be added to the energies for state x of a sampling 
distribution’s domain, a structure Y of temporary variables created by the 
InitDistr function, and a boolean / which is true if and only if w is the last 
energy term for x. 
Output: The updated temporary variable structure Y. 


if / A | then 
//w is defined. 
Y.A(z) ++ 
V.d(a) += 

end 

else if 1 \Y.\(x) = 0 then 
// x has no defined energies. 
Yue) Sk 
Yin, ++ 

end 


if 1A Y.A(x) > 0 then 
// All of x’s energies have been accumulated. 


Y.o(2) = 4S 


if Y.ui(x) < Y.y then 
// x has minimal mean energy. 
Vt = Yu(a) 
end 
end 


return Y 
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Function ConvEnergiesToProbs(Y, X) 
This function converts the accumulated energies into a sampling distribution. 
Input : A structure Y with a field Y.~ that maps domain X to accumulated energies. 
Output: A sampling function f over domain X defined in terms of the accumulated 
energies. 


// 7 is the mean number of defined energies per state of X with at least one 
// defined energy. 


— Y.NtT 
a DEY ae 


// p stores the entropy of the unnormalised sampling distribution. 
p= 0 
// F stores f’s normalising constant. 
iG 
foreach x € X do 
if Y.w(x) £ L then 
// Y(ax) is defined. 
(0) = exp{—n(¥.(e) — ¥B4)} 
p'-= f(x) In(f(x)) 
F+= f(z) 
end 
end 


if Y.n,; >0 then 
// Handle states with no defined energies. 
if Yn, < |X| then 


Wy = g 
end 
else 
woi—0 
end 


Pu c= exp{—w,} 
foreach x € X do 
if Y.w(x) = L then 
F(X) := Du 


end 


return f 
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Function CalcDeltaEMD(a,, a, N, p) 
This function calculates the EMD between the delta function and H,. 

Input: Lists x, and a; of pixel locations for times s and t respectively, the number of 
histogram bins N, and a value p € (0, 1] that determines what percentile to use 
as an upper bound. 

Output: 0 = o[H,, Hs], and h* = Yo, w(t) . 

(h',h*,d*!,W) :=CalcUnnormHist(x,, a, N) 

// Tr is the normalising factor for the bottom 100p% of h’. 

Th = h*p 


// Loop invariant: Hs = Ss HG) ass Ht = eS ANG); 


// where H'(a) & ya 7) 
while i < N A Hj,,, <7 do 


k:=%4 
Ua i= ly 
k-1 k 
_ gyi 
aH! = Hs 
a++ 
/ — Lf 
Meas = h(i) 
| ones / 
A; a Ay, 
/ __ bi 
Ay = Nea 
end 


// Post-condition: h,,, =W(k+1), Hi= 5.80) < m, AL, =a), 
TPO Lae Ais ; 


Mey +s 
— Ik — k-1 k 


Th 


if p< 1 then 


1\2 
HY. 
Th (10-7) 
— 


ae aa 
else 

//k=N-—1, but its value should be N . 
ce 

Th 


o+=2 


end 
pan Wo 


return (0,h*) 
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Function CalcUnnormHist(x,, x1, NV) 
This function calculates a weighted unnormalised histogram over the squared Euclidean 
differences between image pixels. 
Input: The lists of time s pixels x, and time t pixels x, and the number of histogram 
bins NV. 
Output: The unnormalised histogram h’, its total mass ht, the bin width W, and the 
minimum squared Euclidean difference d?* . 


d*(-) := a length |a,| array of real numbers. 


for i := 1 to |x,| do 
P(t) = d?(w,(7), @(7)) 
if d?(i) < d* then 


d=) 
end 
if d?(i) > d?! then 
Clea) 
end 
end 
Wn @taw 
h'(1: lal) = 0 
he = 0 


for i := 1 to |x,| do 
j := max (1, )*o-*]) 


w! = wh (aq(2), #e(2)) 


h'(7) +30! 
At t=! 
end 
return (h’,h*,d?‘,W) 
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